actLifer

Grace Rade, Maeve Tyler-Penny, Julia Ting

library(actLifer)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

The actLifer package contains functions to create actuarial life tables and three datasets ready to be made into a life table. Each mathematical step in transforming mortality data into life expectancy has a corresponding function, which builds the table up to that step. The datasets have been prepared are are ready to use in our functions.

Inspiriation

Mathematically speaking, mortality data is the first step in calculating life expectancy. There are several intermediate calculations between the number of deaths at a given age and life expectancy, and each step builds on the previous values. With this in mind, we created several functions that calculated each intermediate value that build a complete actuarial lifetable when combined. Lifetables can be rather easily created in a spreadsheet, but is a rather involved process in R. Our functions simplify the procedure of creating a lifetable into one function with the option to group by different categorical variables.

The actLifer package is a useful tool for anyone who works with mortality data, wants to calculate life expectancy, or wants to find any of the intermediate values between number of deaths and life expectancy.

Functions

All of the functions take in a dataset that has columns for age group ($x$), deaths at each age ($D_x$), and the midyear population at each age ($P_x$).

central_death_rate(): Calculates the central/crude death rate, $M_x$, which is the number of deaths in a given period divided by the population at risk in that same given period.
- Formula: $M_x = \frac{D_x}{P_x}$
- This is an optional column in the life table, but can be useful to ascertain a general indication of the health status of a given area or population.
conditional_death_prob(): Calculates the conditional probability of death at each age ($q_x$), which is the probability of dying at a certain age within a given period.
- Formula: $q_x = \frac{D_x}{P_x + \frac{D_x}{2}}$
conditional_life_prob(): Calculates the conditional prbability of life at each age ($p_x$), which is the probability of living to a certain age within a given period.
- Formula: $p_x = 1 - q_x$

please note that R will round the conditional probability of life to 1, this will not present problems to later calculations

number_to_survive(): Calculates the number of people to survive to a given age interval ($l_x$), starting with an arbitrary number of 100,000 at age 0 (or age < 1).
- Formula: $l_x = l_{x-1} \cdot p_{x-1}; l_0 = 100,000$
prop_to_survive(): Calculates the proportion of the population surviving to age $x$.
- Formula: $l_x/100000$
- This is another optional column in the life table, and can be removed after all of the calculations are completed.
person_years(): Calculates the person years lived at each age (), which is the total number of years lived at each age $x$ by all people who survive to that age.
- Formula: $ L_x = $
total_years_lived(): Calculates the total years lived to each age $x$, which is the sum of all person years from $0$ to age $x$.
- Formula: $T_x = \sum_{i = 0}^{x}L_x$
life_expectancy(): Calculates the life expectancy at age $x$ ($e_x$), which is the number of years an average person is expected to live beyond their current age.
- Formula: $e_x = \frac{T_x}{l_x}$
- This function will output a complete life table, without the added customization of the lifetable() function.
lifetable(): Outputs a complete lifetable with the ability to customize which of the optional columns are included, and add extra grouping variables.
- if includeAllSteps = TRUE, the lifetable will include CentralDeathRate and PropToSurvive in the final output
- if includeCDR = FALSE, CentralDeathRate will not be included in the final output
- if includePS = FALSE, PropToSurvive will not be included in the dataset
- includeAllSteps, includeCDR, and includePS are all TRUE by default

example <- lifetable(mortality2, "age_group", "population", "deaths")

#> # A tibble: 5 × 11
#>   age_group deaths population CentralDeathRate ConditionalProbDeath
#>   <chr>      <dbl>      <dbl>            <dbl>                <dbl>
#> 1 < 1 year   23161    3970145         0.00583              0.00582 
#> 2 1 year      1568    3995008         0.000392             0.000392
#> 3 2 years     1046    3992154         0.000262             0.000262
#> 4 3 years      791    3982074         0.000199             0.000199
#> 5 4 years      640    3987656         0.000160             0.000160
#> # ℹ 6 more variables: ConditionalProbLife <dbl>, NumberToSurvive <dbl>,
#> #   PropToSurvive <dbl>, PersonYears <dbl>, TotalYears <dbl>,
#> #   LifeExpectancy <dbl>

Calculating life extpectancy is an iterative process, building on the previous intermediate calculations. Each of the functions will call the function of the previous step as it executes, meaning that the output dataset will include the columns of the previous steps. For this reason, there is no need to run each step individually on a dataset, simply run the function for the last step that you are trying to complete.

Central Death Rate is an optional column in the dataset and must be called in addition to the other functions.

Datasets

The package includes three datasets, all sourced from the CDC Wonder Database (https://wonder.cdc.gov/ucd-icd10.html).

mortality contains data from the year 2018 with single-year age groups
mortality2 contains data from the year 2016 with single-year age gaps
mortality3 contains data from the year 2016 with single-year age gaps and a gender grouping variable

What Do These Datasets Look Like?

Each of the included data sets include an age group variable, a population variable, and a deaths variable. Population represents the mid-year population for each age group. Deaths represents the number of people in each age group that have died.

Here’s what the first five rows of mortality2 look like.

#> # A tibble: 5 × 3
#>   age_group deaths population
#>   <chr>      <dbl>      <dbl>
#> 1 < 1 year   23161    3970145
#> 2 1 year      1568    3995008
#> 3 2 years     1046    3992154
#> 4 3 years      791    3982074
#> 5 4 years      640    3987656

Who Should Use This Package?

This package can be used by researchers, actuaries, or anyone that is working with mortality data. This can be particularly useful for those wanting to calculate life expectancy of specific groups, as life expectancy data for sub-groups of the total population of a given area is difficult to find. Additionally, out package can be used to compare life expectancy at different points in time, such as before and after the COVID-19 pandemic.

What Can We Do With This Data?

We can use this package to address question such as:

How does life expectancy differ between population groups?
Is there a specific age-range where life expectancy dramatically changes?
Does the central death rate significantly differ from the probability of death at a certain age?

And many more!

Example 1:

How does life expectancy differ between population groups?

The built-in dataset mortality3 provides a gender variable that can be used to group the data. The lifetable function allows for extra grouping arguments, so that is the function we will use.

Please note that gender is the variable name that the CDC uses to mean biological sex (Male, Female)

lifetable(mortality3, "age_group", "population", "deaths", FALSE, FALSE, FALSE, "gender")
#> # A tibble: 170 × 6
#> # Groups:   "gender" [1]
#>    age_group gender deaths population `"gender"` LifeExpectancy
#>    <chr>     <chr>   <dbl>      <dbl> <chr>               <dbl>
#>  1 < 1 year  Female  10294    1939667 gender               139.
#>  2 < 1 year  Male    12867    2030478 gender               138.
#>  3 1 year    Female    694    1953850 gender               138.
#>  4 1 year    Male      874    2041158 gender               137.
#>  5 2 years   Female    474    1949132 gender               136.
#>  6 2 years   Male      572    2043022 gender               135.
#>  7 3 years   Female    323    1947408 gender               134.
#>  8 3 years   Male      468    2034666 gender               133.
#>  9 4 years   Female    298    1950127 gender               132.
#> 10 4 years   Male      342    2037529 gender               131.
#> # ℹ 160 more rows

The output is a tibble data frame that has calculated life expectancy for each gender. From this we can see any differences in life expectancy between males and females.

Users can use many extra grouping variables to get even more specific with population subgroups. Some suggested variables include (but are not limited to) state/geographic area, race, sex, income group, or health status.

Example 2:

Is there a specific age-range where life expectancy dramatically changes?

The mortality dataset has the age grouped in single-year intervals. We can use this dataset to see if life expectancy changes dramatically from one interval to the next.

lifetable(mortality2, "age_group", "population", "deaths", TRUE, FALSE, FALSE)
#> # A tibble: 85 × 9
#>    age_group deaths population ConditionalProbDeath ConditionalProbLife
#>    <chr>      <dbl>      <dbl>                <dbl>               <dbl>
#>  1 < 1 year   23161    3970145             0.00582                0.994
#>  2 1 year      1568    3995008             0.000392               1.00 
#>  3 2 years     1046    3992154             0.000262               1.00 
#>  4 3 years      791    3982074             0.000199               1.00 
#>  5 4 years      640    3987656             0.000160               1.00 
#>  6 5 years      546    4032515             0.000135               1.00 
#>  7 6 years      488    4029655             0.000121               1.00 
#>  8 7 years      511    4029991             0.000127               1.00 
#>  9 8 years      483    4159114             0.000116               1.00 
#> 10 9 years      462    4178524             0.000111               1.00 
#> # ℹ 75 more rows
#> # ℹ 4 more variables: NumberToSurvive <dbl>, PersonYears <dbl>,
#> #   TotalYears <dbl>, LifeExpectancy <dbl>

From the abbreviated output, we can see that life expectancy does not change dramatically from year to year.

Example 3:

Does the central death rate significantly differ from the probability of death at a certain age?

Central Death Rate (also known as the Crude Death or Mortality Rate), is not a necessary intermediate step for calculating life expectancy, so the conditional_death_prop() function does not call central_death_rate(). To compare the two measures, we will have to run both the functions.

mort<- mortality2 %>% 
  central_death_rate("age_group", "population", "deaths") %>% 
  conditional_death_prob("age_group", "population", "deaths")

head(mort)
#> # A tibble: 6 × 5
#>   age_group deaths population CentralDeathRate ConditionalProbDeath
#>   <chr>      <dbl>      <dbl>            <dbl>                <dbl>
#> 1 < 1 year   23161    3970145         0.00583              0.00582 
#> 2 1 year      1568    3995008         0.000392             0.000392
#> 3 2 years     1046    3992154         0.000262             0.000262
#> 4 3 years      791    3982074         0.000199             0.000199
#> 5 4 years      640    3987656         0.000160             0.000160
#> 6 5 years      546    4032515         0.000135             0.000135
tail(mort)
#> # A tibble: 6 × 5
#>   age_group deaths population CentralDeathRate ConditionalProbDeath
#>   <chr>      <dbl>      <dbl>            <dbl>                <dbl>
#> 1 79 years   62081    1439937           0.0431               0.0422
#> 2 80 years   64987    1358260           0.0478               0.0467
#> 3 81 years   67240    1284298           0.0524               0.0510
#> 4 82 years   67120    1135109           0.0591               0.0574
#> 5 83 years   69758    1079082           0.0646               0.0626
#> 6 84 years   72916    1008890           0.0723               0.0698

Central Death Rate and Conditional Probability of Death start off being very similar in value, as you can see from the first five rows of mort. However, as one ages, the difference between Central Death Rate and Conditional Probability of Death becomes larger, as you can see from the last five rows of the dataset.