Sample Questions

Pricing

Question A

An insurance company is trying to decide on the annual premium it should charge for its auto policies. Board members have decided they want a 10% annual return on investment in the auto portfolio; i.e. for a $1 investment in the portfolio, they expect $1.1 at the end of the year. The strategy team is in charge of determining the proper premium. The historical claim rate is 1%, and on average each claim costs the company $5,000. Assume there are no fix costs, and writing each policy, costs the company $200.

Answer for A

To determine the premium an insurance company should charge for its auto policies while ensuring a 10% return on investment (ROI), we need to use the following formula:

Step 1: Define the Variables

  • Annual return requirement: The company wants a 10% return on investment.

  • Investment per policy: The company spends $200 per policy to write it.

  • Expected profit per policy: Since a 10% return is required, the expected profit per policy is:

    200×1.1=220

    So, the company expects a $220 profit per policy.

  • Historical claim rate: 1% (0.01 probability of a claim).

  • Average claim cost: $5,000 per claim.

  • Expected cost of claims per policy:

    Claim Rate × Average Claim Cost = 0.01 × 5000 = 50

    So, the expected loss per policy due to claims is $50.

Step 2: Define the Profit Equation

Expected Profit = Premium − Expected Loss

Rearrange the formula:

Premium = Expected Profit + Expected Loss

Step 3: Plug in the Values

P = 220 + 50; P = 270

Final Answer:

The company should charge a premium of $270 per policy to meet its 10% ROI requirement.


Business noticed the quality of auto portfolio has decreased since the new premiums are in place, and the claim rate is increasing. By investigating the issue, they found out that a lot of good customers are leaving the company for cheaper options. The strategy team believes the reason is that all customers (good and bad) are charged the same amount, where the amount is decided based on the portfolio’s average claim rate - unconditional probability of claim – which is 1%.

To solve this problem, strategy team decides to put customers into 3 risk groups, based on customer’s probability of submitting a claim. To calculate probability of submitting a claim, modeling team will build a model.

Phase 1 – Model Design

Question B

What is a good target variable for this model? Define the target with details, as if you are explaining to the machine.

Answer for B

Choosing the Target Variable

A target variable is what the model predicts. Since the company wants to assess a customer’s probability of submitting a claim, the target variable should be:

  • Binary Variable (1 or 0):

    • 1 if the policyholder submits at least one claim within a year after purchasing or renewing the policy.

    • 0 if the policyholder does not submit a claim in that year.

This allows the model to predict the likelihood of a customer filing a claim, which is crucial for setting fair premiums.

Question C

If the modeling team starts developing the model in Jan 2023, what is the most recent data that can be used to build this model?

Answer for C

Selecting the Most Recent Data

To train the model, the team needs historical data. However, since the target variable is a 1-year outcome, they must use data that allows for at least 1 year of observation after policy issuance.

  • If model development starts in January 2023, then:

    • The most recent data available would be from December 2021 (since we need a full year to observe claims).

This ensures the model is trained on completed claim outcomes rather than incomplete data.

Question D

Mention two features that may help with predicting the probability of a claim. Define features in detail, as if you are explaining to the machine.

Answer for D

Selecting Features for the Model

Features (independent variables) are inputs used to predict the target variable. Good features should be strongly correlated with claim likelihood.

Example Features:

  1. X1: A binary feature that is 1 if the customer had any accident in the last 2 years before purchasing the policy, and 0 otherwise.

  2. X2: Claim rate in the last 1 year among policyholders living in the same city as the new customer.

These features help the model learn from past accident history and geographical risk factors.

Question E

What is the output of this model? In details.

Answer for E

Model Output

The model’s output is the predicted probability that a given customer submits at least one claim within a year after purchasing or renewing the policy.

  • Output: A probability between 0 and 1.

  • Example: If a customer has a 0.08 (8%) probability, they are considered riskier than someone with a 0.02 (2%) probability.

Question F

The modeling team built the model on 2021 data (those who purchased the policy in 2021). The strategy team believes more data should be used, and asks the modeling team to add 2020 data, but the modeling team believes it is not a good idea to use 2020 data. What is your opinion?

Answer for F

The strategy team suggests using 2020 data in addition to 2021 data, but the modeling team disagrees.

Why Not Use 2020 Data?

  • COVID-19 & Lockdowns (2020) significantly reduced driving activity.

  • Fewer accidents in 2020 led to an artificially low claim rate, making 2020 data not representative of normal driving behavior.

  • Including biased data (2020) may lead to incorrect risk estimations.

Thus, excluding 2020 data ensures that the model reflects normal risk patterns.

Last updated