library(actLifer)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, unionThe actLifer package contains functions to create actuarial life tables and three datasets ready to be made into a life table. Each mathematical step in transforming mortality data into life expectancy has a corresponding function, which builds the table up to that step. The datasets have been prepared are are ready to use in our functions.
Mathematically speaking, mortality data is the first step in calculating life expectancy. There are several intermediate calculations between the number of deaths at a given age and life expectancy, and each step builds on the previous values. With this in mind, we created several functions that calculated each intermediate value that build a complete actuarial lifetable when combined. Lifetables can be rather easily created in a spreadsheet, but is a rather involved process in R. Our functions simplify the procedure of creating a lifetable into one function with the option to group by different categorical variables.
The actLifer package is a useful tool for anyone who works with mortality data, wants to calculate life expectancy, or wants to find any of the intermediate values between number of deaths and life expectancy.
All of the functions take in a dataset that has columns for age group (\(x\)), deaths at each age (\(D_x\)), and the midyear population at each age (\(P_x\)).
central_death_rate(): Calculates the central/crude
death rate, \(M_x\), which is the
number of deaths in a given period divided by the population at risk in
that same given period.
Formula: \(M_x = \frac{D_x}{P_x}\)
This is an optional column in the life table, but can be useful to ascertain a general indication of the health status of a given area or population.
conditional_death_prob(): Calculates the conditional
probability of death at each age (\(q_x\)), which is the probability of dying
at a certain age within a given period.
conditional_life_prob(): Calculates the conditional
prbability of life at each age (\(p_x\)), which is the probability of living
to a certain age within a given period.
please note that R will round the conditional probability of life to 1, this will not present problems to later calculations
number_to_survive(): Calculates the number of people
to survive to a given age interval (\(l_x\)), starting with an arbitrary number
of 100,000 at age 0 (or age < 1).
prop_to_survive(): Calculates the proportion of the
population surviving to age \(x\).
Formula: \(l_x/100000\)
This is another optional column in the life table, and can be removed after all of the calculations are completed.
person_years(): Calculates the person years lived at
each age (), which is the total number of years lived at each age \(x\) by all people who survive to that
age.
total_years_lived(): Calculates the total years
lived to each age \(x\), which is the
sum of all person years from \(0\) to
age \(x\).
life_expectancy(): Calculates the life expectancy at
age \(x\) (\(e_x\)), which is the number of years an
average person is expected to live beyond their current age.
Formula: \(e_x = \frac{T_x}{l_x}\)
This function will output a complete life table, without the
added customization of the lifetable() function.
lifetable(): Outputs a complete lifetable with the
ability to customize which of the optional columns are included, and add
extra grouping variables.
if includeAllSteps = TRUE, the lifetable will
include CentralDeathRate and PropToSurvive in
the final output
if includeCDR = FALSE, CentralDeathRate
will not be included in the final output
if includePS = FALSE, PropToSurvive
will not be included in the dataset
includeAllSteps, includeCDR, and
includePS are all TRUE by default
#> # A tibble: 5 × 11
#>   age_group deaths population CentralDeathRate ConditionalProbDeath
#>   <chr>      <dbl>      <dbl>            <dbl>                <dbl>
#> 1 < 1 year   23161    3970145         0.00583              0.00582 
#> 2 1 year      1568    3995008         0.000392             0.000392
#> 3 2 years     1046    3992154         0.000262             0.000262
#> 4 3 years      791    3982074         0.000199             0.000199
#> 5 4 years      640    3987656         0.000160             0.000160
#> # ℹ 6 more variables: ConditionalProbLife <dbl>, NumberToSurvive <dbl>,
#> #   PropToSurvive <dbl>, PersonYears <dbl>, TotalYears <dbl>,
#> #   LifeExpectancy <dbl>Calculating life extpectancy is an iterative process, building on the previous intermediate calculations. Each of the functions will call the function of the previous step as it executes, meaning that the output dataset will include the columns of the previous steps. For this reason, there is no need to run each step individually on a dataset, simply run the function for the last step that you are trying to complete.
Central Death Rate is an optional column in the dataset and must be called in addition to the other functions.
The package includes three datasets, all sourced from the CDC Wonder Database (https://wonder.cdc.gov/ucd-icd10.html).
mortality contains data from the year 2018 with
single-year age groups
mortality2 contains data from the year 2016 with
single-year age gaps
mortality3 contains data from the year 2016 with
single-year age gaps and a gender grouping variable
Each of the included data sets include an age group variable, a population variable, and a deaths variable. Population represents the mid-year population for each age group. Deaths represents the number of people in each age group that have died.
Here’s what the first five rows of mortality2 look
like.
#> # A tibble: 5 × 3
#>   age_group deaths population
#>   <chr>      <dbl>      <dbl>
#> 1 < 1 year   23161    3970145
#> 2 1 year      1568    3995008
#> 3 2 years     1046    3992154
#> 4 3 years      791    3982074
#> 5 4 years      640    3987656This package can be used by researchers, actuaries, or anyone that is working with mortality data. This can be particularly useful for those wanting to calculate life expectancy of specific groups, as life expectancy data for sub-groups of the total population of a given area is difficult to find. Additionally, out package can be used to compare life expectancy at different points in time, such as before and after the COVID-19 pandemic.
We can use this package to address question such as:
How does life expectancy differ between population groups?
Is there a specific age-range where life expectancy dramatically changes?
Does the central death rate significantly differ from the probability of death at a certain age?
And many more!
How does life expectancy differ between population groups?
The built-in dataset mortality3 provides a
gender variable that can be used to group the data. The
lifetable function allows for extra grouping arguments, so
that is the function we will use.
gender is the variable name that the
CDC uses to mean biological sex (Male, Female)lifetable(mortality3, "age_group", "population", "deaths", FALSE, FALSE, FALSE, "gender")
#> # A tibble: 170 × 6
#> # Groups:   "gender" [1]
#>    age_group gender deaths population `"gender"` LifeExpectancy
#>    <chr>     <chr>   <dbl>      <dbl> <chr>               <dbl>
#>  1 < 1 year  Female  10294    1939667 gender               139.
#>  2 < 1 year  Male    12867    2030478 gender               138.
#>  3 1 year    Female    694    1953850 gender               138.
#>  4 1 year    Male      874    2041158 gender               137.
#>  5 2 years   Female    474    1949132 gender               136.
#>  6 2 years   Male      572    2043022 gender               135.
#>  7 3 years   Female    323    1947408 gender               134.
#>  8 3 years   Male      468    2034666 gender               133.
#>  9 4 years   Female    298    1950127 gender               132.
#> 10 4 years   Male      342    2037529 gender               131.
#> # ℹ 160 more rowsThe output is a tibble data frame that has calculated life expectancy for each gender. From this we can see any differences in life expectancy between males and females.
Users can use many extra grouping variables to get even more specific with population subgroups. Some suggested variables include (but are not limited to) state/geographic area, race, sex, income group, or health status.
Is there a specific age-range where life expectancy dramatically changes?
The mortality dataset has the age grouped in single-year
intervals. We can use this dataset to see if life expectancy changes
dramatically from one interval to the next.
lifetable(mortality2, "age_group", "population", "deaths", TRUE, FALSE, FALSE)
#> # A tibble: 85 × 9
#>    age_group deaths population ConditionalProbDeath ConditionalProbLife
#>    <chr>      <dbl>      <dbl>                <dbl>               <dbl>
#>  1 < 1 year   23161    3970145             0.00582                0.994
#>  2 1 year      1568    3995008             0.000392               1.00 
#>  3 2 years     1046    3992154             0.000262               1.00 
#>  4 3 years      791    3982074             0.000199               1.00 
#>  5 4 years      640    3987656             0.000160               1.00 
#>  6 5 years      546    4032515             0.000135               1.00 
#>  7 6 years      488    4029655             0.000121               1.00 
#>  8 7 years      511    4029991             0.000127               1.00 
#>  9 8 years      483    4159114             0.000116               1.00 
#> 10 9 years      462    4178524             0.000111               1.00 
#> # ℹ 75 more rows
#> # ℹ 4 more variables: NumberToSurvive <dbl>, PersonYears <dbl>,
#> #   TotalYears <dbl>, LifeExpectancy <dbl>From the abbreviated output, we can see that life expectancy does not change dramatically from year to year.
Does the central death rate significantly differ from the probability of death at a certain age?
Central Death Rate (also known as the Crude Death or Mortality Rate),
is not a necessary intermediate step for calculating life expectancy, so
the conditional_death_prop() function does not call
central_death_rate(). To compare the two measures, we will
have to run both the functions.
mort<- mortality2 %>% 
  central_death_rate("age_group", "population", "deaths") %>% 
  conditional_death_prob("age_group", "population", "deaths")
head(mort)
#> # A tibble: 6 × 5
#>   age_group deaths population CentralDeathRate ConditionalProbDeath
#>   <chr>      <dbl>      <dbl>            <dbl>                <dbl>
#> 1 < 1 year   23161    3970145         0.00583              0.00582 
#> 2 1 year      1568    3995008         0.000392             0.000392
#> 3 2 years     1046    3992154         0.000262             0.000262
#> 4 3 years      791    3982074         0.000199             0.000199
#> 5 4 years      640    3987656         0.000160             0.000160
#> 6 5 years      546    4032515         0.000135             0.000135
tail(mort)
#> # A tibble: 6 × 5
#>   age_group deaths population CentralDeathRate ConditionalProbDeath
#>   <chr>      <dbl>      <dbl>            <dbl>                <dbl>
#> 1 79 years   62081    1439937           0.0431               0.0422
#> 2 80 years   64987    1358260           0.0478               0.0467
#> 3 81 years   67240    1284298           0.0524               0.0510
#> 4 82 years   67120    1135109           0.0591               0.0574
#> 5 83 years   69758    1079082           0.0646               0.0626
#> 6 84 years   72916    1008890           0.0723               0.0698Central Death Rate and Conditional Probability of Death start off
being very similar in value, as you can see from the first five rows of
mort. However, as one ages, the difference between Central
Death Rate and Conditional Probability of Death becomes larger, as you
can see from the last five rows of the dataset.