How to use expstudies

The expstudies package is meant to make analyzing life experience data easier. How to use the package is best shown through example, so load up expstudies.

library(expstudies)

You will need to be able to manipulate data frames to effectively work with this package. I use dplyr from the tidyverse for this. We load dplyr along with magrittr for the “%>%” operator.

library(dplyr)
library(magrittr)

##Making exposures from records Some synthetic data called “records” is included in the package. The data must have a “key”, “start”, and “end” column or the package will throw an error. It is also a requirement that the key column have no duplicate values.

records
key start end issue_age gender
B10251C8 2010-04-10 2019-04-04 35 M
D68554D5 2005-01-01 2019-04-04 30 F

The addExposures function creates rows for each policy year between the start and end date. We use 365.25 days as a full policy year.

exposures <- addExposures(records)
head(exposures)
key duration start_int end_int exposure
B10251C8 1 2010-04-10 2011-04-09 0.9993
B10251C8 2 2011-04-10 2012-04-09 1.002
B10251C8 3 2012-04-10 2013-04-09 0.9993
B10251C8 4 2013-04-10 2014-04-09 0.9993
B10251C8 5 2014-04-10 2015-04-09 0.9993
B10251C8 6 2015-04-10 2016-04-09 1.002

There is also an option for calculating monthly policy records in case we want to model skewness within policy years. This isn’t the default because a single record could result in hundreds of rows in the exposures data frame.

exposures_PM <- addExposures(records, type = "PM")
head(exposures_PM)
key duration policy_month start_int end_int exposure
B10251C8 1 1 2010-04-10 2010-05-09 0.08214
B10251C8 1 2 2010-05-10 2010-06-09 0.08487
B10251C8 1 3 2010-06-10 2010-07-09 0.08214
B10251C8 1 4 2010-07-10 2010-08-09 0.08487
B10251C8 1 5 2010-08-10 2010-09-09 0.08487
B10251C8 1 6 2010-09-10 2010-10-09 0.08214

#Mortality/Lapse studies Let’s modify exposures in the year of death and add an indicator in the duration of death.

exposures_mod <- exposures %>% group_by(key) %>% mutate(exposure_mod = if_else(duration == max(duration), 1, exposure), death_cnt = if_else(duration == max(duration), 1, 0)) %>% ungroup()

tail(exposures_mod, 4)
key duration start_int end_int exposure exposure_mod death_cnt
D68554D5 12 2016-01-01 2016-12-31 1.002 1.002 0
D68554D5 13 2017-01-01 2017-12-31 0.9993 0.9993 0
D68554D5 14 2018-01-01 2018-12-31 0.9993 0.9993 0
D68554D5 15 2019-01-01 2019-04-04 0.2574 1 1

Now we can aggregate by duration to calculate mortality rates.

exposures_mod %>% group_by(duration) %>% summarise(q = sum(death_cnt)/sum(exposure_mod))
duration q
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0.5002
10 0
11 0
12 0
13 0
14 0
15 1

##Adding additional information We can add additional information by joining on our key.

exposures_mod <- exposures_mod %>% inner_join(select(records, key, issue_age, gender), by = "key")
head(exposures_mod)
Table continues below
key duration start_int end_int exposure exposure_mod
B10251C8 1 2010-04-10 2011-04-09 0.9993 0.9993
B10251C8 2 2011-04-10 2012-04-09 1.002 1.002
B10251C8 3 2012-04-10 2013-04-09 0.9993 0.9993
B10251C8 4 2013-04-10 2014-04-09 0.9993 0.9993
B10251C8 5 2014-04-10 2015-04-09 0.9993 0.9993
B10251C8 6 2015-04-10 2016-04-09 1.002 1.002
death_cnt issue_age gender
0 35 M
0 35 M
0 35 M
0 35 M
0 35 M
0 35 M

Now we can calculate mortality by attained age. Or by attained age and gender.

exposures_mod %>% mutate(attained_age = issue_age + duration - 1) %>% group_by(attained_age, gender) %>% summarise(q = sum(death_cnt)/sum(exposure_mod)) %>% tail()
attained_age gender q
41 M 0
42 F 0
42 M 0
43 F 0
43 M 1
44 F 1

##Premium Pattern We assume that the user has dated transactions with a key that corresponds to the key in the record file. Some simulated transactions come with the package.

head(trans)
key trans_date amt
B10251C8 2012-12-04 199
B10251C8 2013-12-28 197
B10251C8 2015-12-30 177
B10251C8 2019-05-07 192
B10251C8 2012-04-15 206
B10251C8 2019-04-02 220

The addStart function adds the start date of the appropriate exposure interval to the transactions.

trans_with_interval <- addStart(exposures_PM, trans)
head(trans_with_interval)
start_int key trans_date amt
2010-05-10 B10251C8 2010-05-28 190
2010-06-10 B10251C8 2010-07-04 189
2010-11-10 B10251C8 2010-11-21 179
2011-04-10 B10251C8 2011-05-08 210
2011-07-10 B10251C8 2011-07-12 198
2012-01-10 B10251C8 2012-01-14 194

We can group and aggregate by key and start_int to get unique transaction rows corresponding to intervals in exposures_PM.

trans_to_join <- trans_with_interval %>% group_by(start_int, key) %>% summarise(premium = sum(amt))
head(trans_to_join)
start_int key premium
2005-06-01 D68554D5 97
2005-10-01 D68554D5 169
2005-12-01 D68554D5 96
2006-01-01 D68554D5 193
2006-02-01 D68554D5 107
2006-03-01 D68554D5 119

Then we can join this to the exposures using a left join without duplicating any exposures.

premium_study <- exposures_PM %>% left_join(trans_to_join, by = c("key", "start_int"))
head(premium_study, 10)
key duration policy_month start_int end_int exposure premium
B10251C8 1 1 2010-04-10 2010-05-09 0.08214 NA
B10251C8 1 2 2010-05-10 2010-06-09 0.08487 190
B10251C8 1 3 2010-06-10 2010-07-09 0.08214 189
B10251C8 1 4 2010-07-10 2010-08-09 0.08487 NA
B10251C8 1 5 2010-08-10 2010-09-09 0.08487 NA
B10251C8 1 6 2010-09-10 2010-10-09 0.08214 NA
B10251C8 1 7 2010-10-10 2010-11-09 0.08487 NA
B10251C8 1 8 2010-11-10 2010-12-09 0.08214 179
B10251C8 1 9 2010-12-10 2011-01-09 0.08487 NA
B10251C8 1 10 2011-01-10 2011-02-09 0.08487 NA

Change the NA values resulting from the join to zeros using an if_else.

premium_study <- premium_study %>% mutate(premium = if_else(is.na(premium), 0, premium))
head(premium_study, 10)
key duration policy_month start_int end_int exposure premium
B10251C8 1 1 2010-04-10 2010-05-09 0.08214 0
B10251C8 1 2 2010-05-10 2010-06-09 0.08487 190
B10251C8 1 3 2010-06-10 2010-07-09 0.08214 189
B10251C8 1 4 2010-07-10 2010-08-09 0.08487 0
B10251C8 1 5 2010-08-10 2010-09-09 0.08487 0
B10251C8 1 6 2010-09-10 2010-10-09 0.08214 0
B10251C8 1 7 2010-10-10 2010-11-09 0.08487 0
B10251C8 1 8 2010-11-10 2010-12-09 0.08214 179
B10251C8 1 9 2010-12-10 2011-01-09 0.08487 0
B10251C8 1 10 2011-01-10 2011-02-09 0.08487 0

Now we are free to do any calculations we want. For a simple example we calculate the average premium in the first two policy months. Refer to the section on adding additional information for more creative policy splits.

premium_study %>% filter(policy_month %in% c(1,2)) %>% group_by(policy_month) %>% summarise(avg_premium = mean(premium))
policy_month avg_premium
1 60.46
2 66.88