--- title: "Vignette: Intro to healthiar" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{Vignette: Intro to healthiar} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo=TRUE, eval=TRUE, include=TRUE) ``` ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(rmarkdown.html_vignette.check_title = FALSE) options(knitr.kable.max_rows = 10) set.seed(1) ## Avoid using pacman here, as it causes error in installation if it's not installed already library(healthiar) library(tibble) library(dplyr) library(purrr) library(tidyr) library(knitr) ``` Hi there! This vignette will tell you about `healthiar` and show you how to use `healthiar` with the help of examples. *NOTE*: Before using `healthiar`, please read carefully the information provided in the [readme file](https://github.com/SwissTPH/healthiar?tab=readme-ov-file#readme) or the [welcome webpage](https://swisstph.github.io/healthiar/). By using `healthiar`, you agree to the [terms of use and disclaimer](https://github.com/SwissTPH/healthiar?tab=readme-ov-file#readme). ------------------------------------------------------------------------ # About `healthiar` The `healthiar` functions allow you to quantify and monetize the health impacts (or burden of disease) attributable to exposure. The main focus of the EU project that initiated the development of `healthiar` (BEST-COST) has been two environmental exposures: air pollution and noise. However, `healthiar` can be (and has been) used for many other exposures such as green spaces, chemicals, physical activity... See below a an overview of the package. The whole list of functions included in `healthiar` is available [here](https://swisstph.github.io/healthiar/reference/index.html). ![Figure:`healthiar` overview](images/package_overview.png){width="700"} Be aware that `healthiar` functions are easier to use if your data is prepared in a tidy format, i.e.: - Each variable is a column; each column is a variable. - Each observation is a row; each row is an observation. - Each value is a cell; each cell is a single value. Source: Hadley Wickham (2014). Tidy Data. [Link](https://doi.org/10.18637/jss.v059.i10). For additional information see this [informal explanation of tidy data](https://cran.r-project.org/package=tidyr) (by the author). --------------------------------------------------------------------------- # Function examples The descriptions of the [`healthiar` functions](https://swisstph.github.io/healthiar/reference/index.html) provide examples that you can execute (with `healthiar` loaded) by running `example("function_name")`, e.g. `example("attribute_health")`. In the sections below in this vignette, you find additional examples and more detailed explanations. # Relative risk Goal: E.g. To quantify the COPD cases attributable to PM2.5 (air pollution) exposure in a country. ### Methodological refresher ![Figure: Relative risk approach](images/bod_rr.png){width="700"} ### Function call #### Hard coded ```{r} results_pm_copd <- attribute_health( erf_shape = "log_linear", # shape of the exposure-response function (ERF) rr_central = 1.369, # relative risk (RR) central estimate rr_increment = 10, # increment for which relative risk is valid (in \\mu g / m^3) exp_central = 8.85, # PM2.5 exposure (in \\mu g / m^3) (here: population-weighted) cutoff_central = 5, # cutoff (in \\mu g / m^3) below which no health effects occur bhd_central = 30747 # baseline health data (BHD; here: COPD incidence) ) ``` For alternative ERF shapes see the function documentation of `attribute_health()`. #### Function call - Pre-loaded data `healthiar` comes with some example data that start with `exdat_` that allow you to test functions. Some of these example data will be used in some examples in this vignette. Now we call `attribute_health` with input data from the `healthiar` example data. Note that we can easily provide input data to the function argument using the `$` operator. ```{r, eval=TRUE, include=TRUE, echo=TRUE} results_pm_copd <- attribute_health( erf_shape = "log_linear", rr_central = exdat_pm$relative_risk, rr_increment = 10, exp_central = exdat_pm$mean_concentration, cutoff_central = exdat_pm$cut_off_value, bhd_central = exdat_pm$incidence ) ``` ### Output structure Every `attribute_health()` output consists of two lists ("folders") - `health_main` contains the main results - `health_detailed` detailed results and additional info about the assessment - `results_raw` tibble that contains the detailed results - `input_args` contains a complete list of all arguments used in the assessment (including some internal arguments) - `input_table` tibble that contains all input data entered by the user or derived for the calculations - `results_by_...` - `geo_id_micro`: contains results by the inputted or default micro (e.g. high resolution) geo IDs - `sex`: contains results by sex (only present if function argument `sex` specified) - `age_group`: contains results by age groups (only present if function argument `age_group` sepcified) *NOTE*: `attribute_lifetable()` creates additional output that is specific to life table calculations ### Main results Let's inspect the main results There exist different, equivalent ways of accessing the output - With `$` operator: `results_pm_copd$health_main$impact_rounded` (as in the example above) - By mouse: go to the *Environment* tab in RStudio and click on the variable you want to inspect, and then open the `health_main` results table - With `[[]]` operator `results_pm_copd[["health_main"]]` - With `pluck()` & `pull()`: use the `purrr::pluck` function to select a list and then the `dplyr::pull` function extract values from a specified column, e.g. `results_pm_copd |> purrr::pluck("health_main") |> dplyr::pull("impact_rounded")` ```{r, include=TRUE, eval=TRUE, echo=TRUE} results_pm_copd$health_main ``` It is a table of the format `tibble` (very similar to a `data.frame`) of 3 rows and 23 columns. Let's zoom in on some relevant aspects ```{r} results_pm_copd$health_main |> dplyr::select(exp, bhd, rr, erf_ci, pop_fraction, impact_rounded) |> knitr::kable() # For formatting reasons only: prints tibble in nice layout ``` Interpretation: this table shows us that exposure was 8.85 $\mu g/m^3$, the baseline health data (`bhd_central`) was 30747 (COPD incidence in this instance). The 1st row further shows that the impact attributable to this exposure using the central relative risk (`rr_central`) estimate of 1.369 is 3502 COPD cases, or \~11% of all baseline cases. Some of the most results columns include: - *impact_rounded* rounded attributable health impact/burden - *impact* raw impact/burden - *pop_fraction* population attributable fraction (PAF) or population impact fraction (PIF) *NOTE*: The main output contains more columns that provide additional information about the assessment. # Absolute risk Goal: E.g. To quantify the incidence cases of high annoyance attributable to (road traffic) noise exposure. ## Refresher - Burden of disease with absolute risk ![Figure: Absolute risk approach](images/bod_ar.png){width="700"} ### Function call ```{r} results_noise_ha <- attribute_health( approach_risk = "absolute_risk", # default is "relative_risk" exp_central = c(57.5, 62.5, 67.5, 72.5, 77.5), # mean of the exposure categories pop_exp = c(387500, 286000, 191800, 72200, 7700), # population exposed per exposure category erf_eq_central = "78.9270-3.1162*c+0.0342*c^2" # exposure-response function ) ``` The `erf_eq_central` argument can digest other types of functions (see section on user-defined ERF) ### Main results ```{r echo=FALSE} results_noise_ha |> purrr::pluck("health_main") |> dplyr::select(erf_eq, erf_ci, impact_rounded) |> knitr::kable() # Prints tibble in a minimal layout ``` ### Results per noise exposure band ```{r eval=FALSE, include=TRUE, echo=TRUE} results_noise_ha$health_detailed$results_raw ``` ```{r echo=FALSE, include=TRUE, eval=TRUE} results_noise_ha[["health_detailed"]][["results_raw"]] |> select(exp_category, exp, pop_exp, impact) |> knitr::kable() ``` Alternatively, it's also possible to only assess the impacts for a single noise exposure band ```{r} results_noise_ha <- attribute_health( approach_risk = "absolute_risk", exp_central = 57.5, pop_exp = 387500, erf_eq_central = "78.9270-3.1162*c+0.0342*c^2" ) ``` ```{r eval=FALSE, include=FALSE, echo=TRUE} results_noise_ha$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_noise_ha[["health_main"]] |> dplyr::select(exp_category, impact) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_noise_ha) ``` # Multiple geographic units ## using relative risk Goal: e.g. To quantify the disease cases attributable to PM2.5 exposure in multiple cities using one single command. - Enter unique ID's as a vector (`numeric` or `character`) to the `geo_id_micro` argument (e.g. municipality names or province abbrevations) - Optional: aggregate unit-specific results by providing higher-level ID's (e.g. region names or country abbreviations) as a vector (`numeric` or `character`) to the `geo_id_macro` argument Input to the other function arguments is specified as usual, either as a vector or a single values (which will be recycled to match the length of the other input vectors). ### Function call ```{r} results_iteration <- attribute_health( # Names of Swiss cantons geo_id_micro = c("Zurich", "Basel", "Geneva", "Ticino", "Jura"), # Names of languages spoken in the selected Swiss cantons geo_id_macro = c("German","German","French","Italian","French"), rr_central = 1.369, rr_increment = 10, cutoff_central = 5, erf_shape = "log_linear", exp_central = c(11, 11, 10, 8, 7), bhd_central = c(4000, 2500, 3000, 1500, 500) ) ``` In this example we want to aggregate the lower-level geographic units (municipalities) by the higher-level language region (`"German", "French", "Italian"`) ### Main results The main output contains aggregated results ```{r eval=FALSE, include=FALSE, echo=TRUE} results_iteration$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_iteration[["health_main"]] |> dplyr::select(geo_id_macro, impact_rounded, erf_ci, exp_ci, bhd_ci) |> knitr::kable() ``` In this case `health_main` contains the cumulative / summed number of stroke cases attributable to PM2.5 exposure in the 5 geo units, which is `r sum(results_iteration[["health_main"]]$impact_rounded)` (using a relative risk of 1.369). ### Detailed results The geo unit-specific information and results are stored under `health_detailed`\>`results_raw` ```{r eval=FALSE, include=FALSE, echo=TRUE} results_iteration$health_detailed$results_raw ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_iteration[["health_detailed"]][["results_raw"]] |> dplyr::select(geo_id_micro, impact_rounded, geo_id_macro) |> knitr::kable() ``` `health_detailed` also contains impacts obtained through all combinations of input data central, lower and upper estimates (as usual), besides the results per geo unit (not shown above) ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_iteration) ``` ## using absolute risk Goal: E.g.To quantify high annoyance cases attributable to noise exposure in rural and urban areas. ### Function call ```{r} data <- exdat_noise |> ## Filter for urban and rural regions dplyr::filter(region == "urban" | region == "rural") ``` ```{r} results_iteration_ar <- attribute_health( # Both the rural and urban areas belong to the higher-level "total" region geo_id_macro = "total", geo_id_micro = data$region, approach_risk = "absolute_risk", exp_central = data$exposure_mean, pop_exp = data$exposed, erf_eq_central = "78.9270-3.1162*c+0.0342*c^2" ) ``` *NOTE*: the length of the input vectors fed to `geo_id_micro`, `exp_central`, `pop_exp` must match and must be (number of geo units) x (number of exposure categories) = 2 x 5 = **10**, because we have 2 geo units (`"rural"` and `"urban"`) and 5 exposure categories. ### Main results `health_main` contains the aggregated results (i.e. sum of impacts in rural and urban areas) ```{r eval=FALSE, include=FALSE, echo=TRUE} results_iteration_ar$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_iteration_ar[["health_main"]] |> dplyr::select(geo_id_macro, impact_rounded, erf_ci, exp_ci) |> knitr::kable() ``` ### Detailed results Impact by geo unit, in this case impact in the rural and in the urban area ```{r eval=FALSE, include=FALSE, echo=TRUE} results_iteration_ar$health_detailed$results_by_geo_id_micro ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_iteration_ar[["health_detailed"]][["results_by_geo_id_micro"]] |> dplyr::select(geo_id_micro, geo_id_macro, impact) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_iteration_ar, data) ``` # Uncertainty ## Confidence interval Goal: E.g. To quantify the COPD cases attributable to PM2.5 exposure taking into account uncertainty (lower and upper bound of confidence interval) in several input arguments: relative risk, exposure and baseline health data. ### Function call ```{r} results_pm_copd <- attribute_health( erf_shape = "log_linear", rr_central = 1.369, rr_lower = 1.124, # lower 95% confidence interval (CI) bound of RR rr_upper = 1.664, # upper 95% CI bound of RR rr_increment = 10, exp_central = 8.85, exp_lower = 8, # lower 95% CI bound of exposure exp_upper = 10, # upper 95% CI bound of exposure cutoff_central = 5, bhd_central = 30747, bhd_lower = 28000, # lower 95% confidence interval estimate of BHD bhd_upper = 32000 # upper 95% confidence interval estimate of BHD ) ``` ### Detailed results Let's inspect the detailed results: ```{r eval=FALSE, echo=TRUE, include=FALSE} results_pm_copd$health_detailed$results_raw ``` ```{r echo=FALSE} results_pm_copd$health_detailed$results_raw |> dplyr::select(erf_ci, exp_ci, bhd_ci, impact_rounded) |> dplyr::slice(1:9) |> knitr::kable() # Prints tibble in a minimal layout ``` Each row contains the estimated attributable cases (`impact_rounded`) obtained by the input data specified in the columns ending in "\_ci" and the other calculation pathway specifications in that row (not shown). - The 1st contains the estimated attributable impact when using the central estimates of relative risk, exposure and baseline health data - The 2nd row shows the impact when using the central estimates of the relative risk, exposure in combination with the lower estimate of the baseline health data - ... *NOTE*: only `r length(1:9)` of the `r length(results_pm_copd$health_detailed$results_raw$impact)` possible combinations are displayed due to space constraints *NOTE*: only a selection of columns are shown ## Monte Carlo simulation Goal: E.g. To summarize uncertainty of attributable health impacts (i.e. to get a single confidence interval instead of many combinations) by using a Monte Carlo simulation. You can do this carrying out a Monte Carlo uncertainty analysis via the `summarize_uncertainty()` function. The outcome of the Monte Carlo analysis is added to the variable entered as the `results` argument, which is `results_pm_copd` in our case. Two lists ("folders") are added: - `uncertainty_main` contains the central estimate and the corresponding 95% confidence intervals obtained through the Monte Carlo assessment - `uncertainty_detailed` contains all `n_sim` simulations of the Monte Carlo assessment ### Function call ```{r} results_pm_copd_summarized <- summarize_uncertainty( output_attribute = results_pm_copd, n_sim = 100 ) ``` ### Main results ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_copd_summarized$uncertainty_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_pm_copd_summarized$uncertainty_main |> knitr::kable() ``` ### Detailed results The folder `uncertainty_detailed` contains all single simulations. Let's look at the impact of the first 10 simulations. ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_copd_summarized$uncertainty_detailed$impact_by_sim ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_pm_copd_summarized$uncertainty_detailed$impact_by_sim |> knitr::kable() ``` ```{r, echo=FALSE, eval=TRUE, include=FALSE} rm(results_pm_copd_summarized) ``` # User-defined ERF Goal: E.g. To quantify COPD cases attributable to air pollution exposure by applying a user-defined exposure-response function (ERF), such as the MR-BRT curves from Global Burden of Disease study. In this case, the function arguments `erf_eq_...` require a function as input, so we use an auxiliary function (`splinefun()`) to transform the points on the ERF into type `function`. ```{r} results_pm_copd_mr_brt <- attribute_health( exp_central = 8.85, bhd_central = 30747, cutoff_central = 0, # Specify the function based on x-y point pairs that lie on the ERF erf_eq_central = splinefun( x = c(0, 5, 10, 15, 20, 25, 30, 50, 70, 90, 110), y = c(1.00, 1.04, 1.08, 1.12, 1.16, 1.20, 1.23, 1.35, 1.45, 1.53, 1.60), method = "natural") ) ``` The ERF curve created looks as follows ```{r fig.alt="ERF curve", eval=TRUE, include=TRUE, echo=FALSE} x <- c(0, 5, 10, 15, 20, 25, 30, 50, 70, 90, 110) y <- c(1.00, 1.04, 1.08, 1.12, 1.16, 1.20, 1.23, 1.35, 1.45, 1.53, 1.60) spline_func <- stats::splinefun( x = x, y = y, method = "natural") ## Generate finer x-values x_dense <- seq(min(x), max(x), length.out = 200) ## Evaluate the spline function at these points y_dense <- spline_func(x_dense) ## Plot plot(x, y, # main = "User-defined ERF", main = "", xlab = "Exposure", ylab = "Relative risk", col = "blue", pch = 19) lines(x_dense, y_dense, col = "red", lwd = 2) legend("topleft", legend = c("Original points", "Spline curve"), col = c("blue", "red"), pch = c(19, NA), lty = c(NA, 1), lwd = c(NA, 2)) ``` Alternatively, other functions (e.g. `approxfun()`) can be used to create the ERF # Sub-group analysis ## by age group Goal: e.g. To quantify health impacts attributable to air pollution in a country *by age group*. To obtain age-group-specific results, the baseline health data (and possibly exposure) must be available by age group. If the `age` argument was specified, age-group-specific results are available under `health_detailed` in the sub-folder `results_by_age_group`. ```{r} results_age_group <- attribute_health( approach_risk = "relative_risk", age = c("below_65", "65_plus"), exp_central = c(8, 7), cutoff_central = c(5, 5), bhd_central = c(1000, 5000), rr_central = 1.06, rr_increment = 10, erf_shape = "log_linear" ) results_age_group$health_detailed$results_by_age_group |> dplyr::select(age_group, impact_rounded, exp, bhd) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_age_group) ``` ## by sex Goal: e.g. To quantify health impacts attributable to air pollution in a country *by sex*. To obtain sex-specific results, the baseline health data (and possibly exposure) must be entered by sex. If the `sex` argument was specified, sex-specific results are available under `health_detailed` in the sub-folder `results_by_sex`. ```{r eval=TRUE, include=TRUE, echo=TRUE} results_sex <- attribute_health( approach_risk = "relative_risk", sex = c("female", "male"), exp_central = c(8, 8), cutoff_central = c(5, 5), bhd_central = c(1000, 1100), rr_central = 1.06, rr_increment = 10, erf_shape = "log_linear" ) results_sex$health_detailed$results_by_sex |> dplyr::select(sex, impact_rounded, exp, bhd) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_sex) ``` ## by other sub-groups Goal: e.g. To quantify health impacts attributable to air pollution in a country *by education level*. A (random) ID vector of unique values must be entered to the argument `geo_id_micro`, to indicate that each row is an observation on its own (and not just exposure categories) A single vector (or a data frame / tibble with multiple columns) to group the results by can be entered to the `info` argument. In this example, this will be information about the education level. In a second step one can group the results based on one or more columns and so summarize the results by the preferred sub-groups ```{r} info <- data.frame( education = rep(c("secondary", "bachelor", "master"), each = 5) # education level ) output_attribute <- attribute_health( rr_central = 1.063, rr_increment = 10, erf_shape = "log_linear", cutoff_central = 0, exp_central = sample(6:10, 15, replace = TRUE), bhd_central = sample(100:500, 15, replace = TRUE), geo_id_micro = c(1:nrow(info)), # (random) ID must be entered info = info ) output_stratified <- output_attribute$health_detailed$results_raw |> dplyr::group_by(info_column_1) |> dplyr::summarize(mean_impact = mean(impact)) |> print() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(info, output_attribute, output_stratified) ``` # YLL & deaths with life table ## YLL Goal: E.g. To quantify the years of life lost (YLL) due to deaths from COPD attributable to PM2.5 exposure during one year. We can use `attribute_lifetable()` combined with life table input data to determine YLL attributable to an environmental stressor. ### Function call ```{r} results_pm_yll <- attribute_lifetable( year_of_analysis = 2019, health_outcome = "yll", rr_central = 1.118, rr_increment = 10, erf_shape = "log_linear", exp_central = 8.85, cutoff_central = 5, min_age = 20, # age from which population is affected by the exposure # Life table information age_group = exdat_lifetable$age_group, sex = exdat_lifetable$sex, population = exdat_lifetable$midyear_population, # In the life table case, BHD refers to deaths bhd_central = exdat_lifetable$deaths ) ``` ### Main results Total YLL attributable to exposure (sum of sex-specific impacts) ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_yll$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_pm_yll$health_main |> dplyr::slice_head() |> dplyr::select(impact_rounded, erf_ci, exp_ci, bhd_ci) |> knitr::kable() ``` Use the two arguments `approach_exposure` and `approach_newborns` to modify the life table calculation - `approach_exposure` - `"single_year"` (default) population is exposed for a single year and the impacts of that exposure are calculated - `"constant"` population is exposed every year - `approach_newborns` - `"without_newborns"` (default) assumes that the population in the year of analysis is followed over time, without considering newborns being born - `"with_newborns"` assumes that for each year after the year of analysis n babies are born, with n being equal to the (male and female) population aged 0 that is provided in the argument population ### Detailed results Attributable YLL results - per year - per age (group) - per sex (if sex-specific life table data entered) are available. *Note*: We will inspect the results for females; male results are also available. ### Results per year *NOTE*: only a selection of years is shown ```{r} results_pm_yll$health_detailed$results_raw |> dplyr::summarize( .by = year, impact = sum(impact, na.rm = TRUE) ) ``` ```{r} results_pm_yll$health_detailed$results_raw |> dplyr::summarize( .by = year, impact = sum(impact, na.rm = TRUE)) |> knitr::kable() ``` #### YLL ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_yll$health_detailed$intermediate_calculations |> dplyr::filter(sex == "female") |> dplyr::pull(impact_by_age_and_year) |> purrr::pluck(1) ``` ```{r echo=FALSE} results_pm_yll$health_detailed$intermediate_calculations |> dplyr::filter(sex == "female") |> dplyr::pull(impact_by_age_and_year) |> purrr::pluck(1) |> dplyr::select(age_start, age_end, impact_2019) |> dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> knitr::kable() ``` #### Population (baseline scenario) Baseline scenario refers to the scenario with exposure (i.e. the scenario specified in the assessment). ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_exposed_by_age_and_year) |> purrr::pluck(1) ``` ```{r echo=FALSE} results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_exposed_by_age_and_year) |> purrr::pluck(1) |> dplyr::select(age_start, midyear_population_2019, midyear_population_2020, midyear_population_2021, midyear_population_2022) |> dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> knitr::kable() ``` #### Population (unexposed scenario) Impacted scenario refers to the scenario without exposure. ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_unexposed_by_age_and_year) |> purrr::pluck(1) ``` ```{r echo=FALSE} results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_unexposed_by_age_and_year) |> purrr::pluck(1) |> dplyr::select(age_start, midyear_population_2019, midyear_population_2020, midyear_population_2021, midyear_population_2022) |> dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> knitr::kable() ``` ## Deaths Goal: E.g. To determine premature deaths from COPD attributable to PM2.5 exposure during one year. See example on YLL for additional info on `attribute_lifetable()` calculations and its output. ### Function call ```{r} results_pm_deaths <- attribute_lifetable( health_outcome = "deaths", year_of_analysis = 2019, rr_central = 1.118, rr_increment = 10, erf_shape = "log_linear", exp_central = 8.85, cutoff_central = 5, min_age = 20, # age from which population is affected by the exposure # Life table information age_group = exdat_lifetable$age_group, sex = exdat_lifetable$sex, population = exdat_lifetable$midyear_population, bhd_central = exdat_lifetable$deaths ) ``` ### Main results Total premature deaths attributable to exposure (sum of sex-specific impacts) ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_deaths$health_main$impact ``` ```{r, eval=TRUE, include=TRUE, echo=FALSE} results_pm_deaths$health_main |> dplyr::slice_head() |> dplyr::select(impact_rounded, erf_ci, exp_ci, bhd_ci) |> knitr::kable() ``` ### Detailed results Attributable premature deaths results - per year (if argument `approach_exposure = "constant"`) - per age (group) - per sex (if sex-specific life table data entered) are available *Note*: we will inspect the results for females; male results are also available *Note*: because we set the function argument `approach_exposure = "constant"` in the function call results are available for one year (the year of analysis) ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_deaths$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_unexposed_by_age_and_year) |> purrr::pluck(1) ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_unexposed_by_age_and_year) |> purrr::pluck(1) |> dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_pm_deaths) ``` # YLD Goal: E.g. To quantify the years lived with disability (YLD) attributable to air pollution exposure using disability weights. ### Function call ```{r} results_pm_copd_yld <- attribute_health( rr_central = 1.1, rr_increment = 10, erf_shape = "log_linear", exp_central = 8.85, cutoff_central = 5, bhd_central = 1000, duration_central = 10, dw_central = 0.2 ) ``` ### Main results ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_copd_yld$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_pm_copd_yld$health_main |> dplyr::select(erf_ci, impact) |> knitr::kable() ``` # DALYs Goal: To obtain the Disability-Adjusted Life Years as the sum of YLLs and YLDs. This is possible using the function `daly()`. ### Function call ```{r} results_daly <- daly( output_attribute_yll = results_pm_yll, output_attribute_yld = results_pm_copd_yld ) ``` ### Main results YLL, YLD & DALY ```{r eval=FALSE, include=FALSE, echo=TRUE} results_daly$health_main |> select(impact_yll_rounded, impact_yld_rounded, impact_rounded) ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_daly$health_main |> dplyr::select(impact_yll_rounded, impact_yld_rounded, impact_rounded) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_daly, results_pm_yll, results_pm_copd_yld) ``` # Comparison of two health scenarios Goal: E.g. To compare the health impacts in the scenario "before intervention" vs. "after intervention". ### Function call 1. Use `attribute_health()` to calculate burden of scenarios A & B ```{r} scenario_A <- attribute_health( exp_central = 8.85, # EXPOSURE 1 cutoff_central = 5, bhd_central = 25000, approach_risk = "relative_risk", erf_shape = "log_linear", rr_central = 1.118, rr_increment = 10) ``` ```{r} scenario_B <- attribute_health( exp_central = 6, # EXPOSURE 2 cutoff_central = 5, bhd_central = 25000, approach_risk = "relative_risk", erf_shape = "log_linear", rr_central = 1.118, rr_increment = 10) ``` Alternatively, the function `attribute_mod()` can be used to modify an existing scenario, e.g. `scenario_A` ```{r} scenario_B <- attribute_mod( output_attribute = scenario_A, exp_central = 6 ) ``` 2. Use `compare()` to compare scenarios A & B ```{r} results_comparison <- compare( approach_comparison = "delta", # or "pif" (population impact fraction) output_attribute_scen_1 = scenario_A, output_attribute_scen_2 = scenario_B ) ``` The default value for the argument `approach_comparison` is `"delta"`. The alterntive is `"pif"` (population impact fraction). See the function documentation of `compare()` for more details. ### Main results ```{r eval=FALSE, include=FALSE, echo=TRUE} results_comparison$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_comparison[["health_main"]] |> dplyr::select( impact, impact_rounded, impact_scen_1, impact_scen_2, bhd, dplyr::starts_with("exp_"), -dplyr::starts_with("exp_ci"), # remove col "exp_ci" dplyr::starts_with("rr_con")) |> dplyr::slice_head() |> knitr::kable() ``` ### Detailed results The `compare()` results contain two additional outputs in addition to those we have already seen - `health_detailed` - `scen_1` contains results of scenario 1 (scenario A in our case) - `scen_2` contains results of scenario 2 (scenario B in our case) ```{r, echo=FALSE, eval=TRUE, include=FALSE} rm(results_comparison, scenario_A, scenario_B) ``` # Two correlated exposures Goal: E.g. To quantify the total health impact attributable to PM2.5 and NO2. For this purpose, you can use the function `multiexpose()`. *Attention*: To apply this method, the relative risks for one exposure must be adjusted for the second exposure. ### Function call ```{r} results_pm <- attribute_health( erf_shape = "log_linear", rr_central = 1.369, rr_increment = 10, exp_central = 8.85, cutoff_central = 5, bhd_central = 30747 ) results_no2 <- attribute_mod( output_attribute = results_pm, exp_central = 10.9, rr_central = 1.031 ) results_multiplicative <- multiexpose( output_attribute_exp_1 = results_pm, output_attribute_exp_2 = results_no2, exp_name_1 = "pm2.5", exp_name_2 = "no2", approach_multiexposure = "multiplicative" ) ``` ```{r, echo=FALSE, eval=TRUE, include=FALSE} rm(results_no2, results_pm) ``` ### Main results ```{r eval=FALSE, include=TRUE, echo=TRUE} results_multiplicative$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_multiplicative$health_main |> dplyr::select(impact_rounded) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_multiplicative) ``` # Threshold additional to cut-off Goal: To quantify health impacts in the exposure group 55dB+ (calculation threshold) that are affected by a exposure above the effect threshold of 45 dB (cut-off). The function arguments `erf_eq_...` require a function as input. Instead of using a `splinefun()` this can also be fulfilled by using a 'function(c)' which is of type 'function'. ```{r} #setting up function parameters threshold_effect <- 45 RR <- 1.055 threshold_calculation <- 55 rr_increment <- 10 # define categorical function, the ifelse condition enables the case distinction erf_function <- function(c){ output <- ifelse(c select(erf_ci, monetized_impact) |> knitr::kable() ``` We see that the monetized impact (discounted) is more than 160 million EURO. Alternatively, you can also monetize (attributable) health impacts from a non-`healthiar` source. ```{r} results <- monetize( impact = 1151, valuation = 100 ) ``` ## Cost-benefit analysis Goal: E.g. To estimate the net-benefit of a health policy via cost-benefit analysis (CBA). Let's imagine we design a policy that would reduce air pollution to 5 $\mu g/m^3$, which is the concentration specified in the `cutoff_central` argument in the initial `attribute_health()` call. So we could avoid all COPD cases attributed to air pollution. Considering the cost to implement the policy (estimated at 100 million EURO), what would be the monetary net benefit of such a policy, ? We can find out using `healthiar`'s `cba()` function. The outcome of the CBA is contained in two folders, which are added to the existing assessment: - `cba_main` contains the central estimate and the corresponding 95% confidence intervals obtained - `cba_detailed` contains additional intermediate results for both cost and benefit - `benefit` contains results `by_year` and raw results `health_raw` - `cost` contains the costs of the policy at the end of the period specified in the `n_years_cost` argument ### Function call ```{r} cba <- cba( output_attribute = results_pm_copd, valuation = 50000, cost = 100000000, discount_shape = "exponential", discount_rate_benefit = 0.03, discount_rate_cost = 0.03, n_years_benefit = 5, n_years_cost = 5 ) ``` ### Main results ```{r} cba$cba_main |> dplyr::select(benefit, cost, net_benefit) |> knitr::kable() ``` ```{r, echo=FALSE, eval=TRUE, include=FALSE} rm(cba) ``` We see that the central and upper 95% confidence interval estimates of avoided attributable COPD cases result in a net monetary benefit of the policy, while the lower 95% confidence interval estimate results in a net cost! # Multiple deprivation index Goal: E.g. To estimate the multiple deprivation index (MDI) to use it for the argument `social_indicator` in the function `socialize()`. ### Function call ```{r eval=TRUE, include=TRUE, echo=TRUE} mdi <- prepare_mdi( geo_id_micro = exdat_prepare_mdi$id, edu = exdat_prepare_mdi$edu, unemployed = exdat_prepare_mdi$unemployed, single_parent = exdat_prepare_mdi$single_parent, pop_change = exdat_prepare_mdi$pop_change, no_heating = exdat_prepare_mdi$no_heating, n_quantile = 10, verbose = FALSE ) ``` *Note*: `verbose = FALSE` suppresses any output to the console (default: `verbose = TRUE`, i.e. with printing turned on). ### Main results Function output includes - `mdi_main`, a tibble containing the BEST-COST MDI ```{r eval=FALSE, include=TRUE, echo=TRUE} mdi$mdi_main |> select(geo_id_micro, MDI, MDI_index) ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} mdi$mdi_main |> dplyr::select(geo_id_micro, MDI, MDI_index) |> knitr::kable() ``` ### Detailed results - `mdi_detailed` - DESCRIPTIVE STATISTICS - PEARSON'S CORRELATION COEFFICIENTS - CRONBACH'S α, including the reliability rating this value indicates - Code for boxplots of the single indicators - Code for histogram of the MDI's for the geo units with a normal distribution curve To reproduce the boxlots run ```{r fig.alt="Boxplot of Normalized Indicators and MDI", eval=TRUE, include=TRUE, echo=TRUE} eval(mdi$mdi_detailed$boxplot) ``` Analogeously, to reproduce the histogram run ```{r fig.alt="Histogram of MDI with normal curve", eval=TRUE, include=TRUE, echo=TRUE} eval(mdi$mdi_detailed$histogram) ``` ```{r, eval=TRUE, include=FALSE, echo=FALSE} rm(mdi) ``` --------------------------------------------------------------------------- # Export and visualize Exporting and visualizing results is out of scope of `healthiar`. To export and visualize, you can make use of existing functions in other packages beyond `healthiar` as indicated below. ## Export results Export as `.csv` file ```{r eval=FALSE, include=FALSE, echo=TRUE} write.csv(x = results_pm_copd$health_main, file = "exported_results/results_pm_copd.csv") ``` Save as `.Rdata` file ```{r eval=FALSE, include=FALSE, echo=TRUE} save(results_pm_copd, file = "exported_results/results_pm_copd.Rdata") ``` Export to Excel (as `.xlsx` file) ```{r eval=FALSE, include=FALSE, echo=TRUE} openxlsx::write.xlsx(x = results_pm_copd$health_main, file = "exported_results/results_pm_copd.xlsx") ``` ## Visualize results Visualization is out of scope of `healthiar`. You can visualize in - R, e.g. with the `ggplot2` package ([online book by the creator](https://ggplot2-book.org/)) - Excel (export results first) - Other tools ------------------------------------------------------------------------ # Abbreviations BHD/bhd = baseline health data CI = confidence interval CBA/cba = cost-benefit analysis exp = exposure ERF = exposure-response function RR/rr = relative risk YLL/yll = years of life lost