---
title: "Vignette: Intro to healthiar"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Vignette: Intro to healthiar}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=TRUE, eval=TRUE, include=TRUE)
```


```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE)
options(knitr.kable.max_rows = 10)
set.seed(1)

## Avoid using pacman here, as it causes error in installation if it's not installed already
library(healthiar)
library(tibble)
library(dplyr)
library(purrr)
library(tidyr)
library(knitr)
```

Hi there!

This vignette will tell you about `healthiar` and show you how to use `healthiar` with the help of examples.

*NOTE*: Before using `healthiar`, please read carefully the information provided in the [readme file](https://github.com/SwissTPH/healthiar?tab=readme-ov-file#readme) or the [welcome webpage](https://swisstph.github.io/healthiar/). By using `healthiar`, you agree to the [terms of use and disclaimer](https://github.com/SwissTPH/healthiar?tab=readme-ov-file#readme).



------------------------------------------------------------------------

# About `healthiar`

The `healthiar` functions allow you to quantify and monetize the health impacts (or burden of disease) attributable to exposure. The main focus of the EU project that initiated the development of `healthiar` (BEST-COST) has been two environmental exposures:  air pollution and noise. However, `healthiar` can be (and has been) used for many other exposures such as green spaces, chemicals, physical activity...

See below a an overview of the package. The whole list of functions included in `healthiar` is available [here](https://swisstph.github.io/healthiar/reference/index.html). 


![Figure:`healthiar` overview](images/package_overview.png){width="700"}


Be aware that `healthiar` functions are easier to use if your data is prepared in a tidy format, i.e.:

-   Each variable is a column; each column is a variable.

-   Each observation is a row; each row is an observation.

-   Each value is a cell; each cell is a single value.

Source: Hadley Wickham (2014). Tidy Data. [Link](https://doi.org/10.18637/jss.v059.i10). For additional information see this [informal explanation of tidy data](https://cran.r-project.org/package=tidyr) (by the author).

---------------------------------------------------------------------------

# Function examples

The descriptions of the [`healthiar` functions](https://swisstph.github.io/healthiar/reference/index.html) provide examples that you can execute (with `healthiar` loaded) by running `example("function_name")`, e.g. `example("attribute_health")`. In the sections below in this vignette, you find additional examples and more detailed explanations.


# Relative risk 

Goal: E.g. To quantify the COPD cases attributable to PM2.5 (air pollution) exposure in a country.

### Methodological refresher

![Figure: Relative risk approach](images/bod_rr.png){width="700"}

### Function call

#### Hard coded

```{r}
results_pm_copd <- attribute_health(
  erf_shape = "log_linear", # shape of the exposure-response function (ERF)
  rr_central = 1.369, # relative risk (RR) central estimate
  rr_increment = 10,  # increment for which relative risk is valid (in \\mu g / m^3)
  exp_central = 8.85, # PM2.5 exposure (in \\mu g / m^3) (here: population-weighted)
  cutoff_central = 5, # cutoff (in \\mu g / m^3) below which no health effects occur 
  bhd_central = 30747 # baseline health data (BHD; here: COPD incidence)
)
```

For alternative ERF shapes see the function documentation of `attribute_health()`.

#### Function call - Pre-loaded data

`healthiar` comes with some example data that start with `exdat_` that allow you to test functions. Some of these example data will be used in some examples in this vignette.

Now we call `attribute_health` with input data from the `healthiar` example data. Note that we can easily provide input data to the function argument using the `$` operator.

```{r, eval=TRUE, include=TRUE, echo=TRUE}
results_pm_copd <- attribute_health(
  erf_shape = "log_linear",
  rr_central = exdat_pm$relative_risk, 
  rr_increment = 10, 
  exp_central = exdat_pm$mean_concentration,
  cutoff_central = exdat_pm$cut_off_value,
  bhd_central = exdat_pm$incidence
)
```

### Output structure

Every `attribute_health()` output consists of two lists ("folders")

-   `health_main` contains the main results

-   `health_detailed` detailed results and additional info about the assessment

    -   `results_raw` tibble that contains the detailed results

    -   `input_args` contains a complete list of all arguments used in the assessment (including some internal arguments)

    -   `input_table` tibble that contains all input data entered by the user or derived for the calculations

    -   `results_by_...`
    
        - `geo_id_micro`: contains results by the inputted or default micro (e.g. high resolution) geo IDs
        
        - `sex`: contains results by sex (only present if function argument `sex` specified)
        
        - `age_group`: contains results by age groups (only present if function argument `age_group` sepcified)

*NOTE*: `attribute_lifetable()` creates additional output that is specific to life table calculations

### Main results

Let's inspect the main results

There exist different, equivalent ways of accessing the output

-   With `$` operator: `results_pm_copd$health_main$impact_rounded` (as in the example above)

-   By mouse: go to the *Environment* tab in RStudio and click on the variable you want to inspect, and then open the `health_main` results table

-   With `[[]]` operator `results_pm_copd[["health_main"]]`

-   With `pluck()` & `pull()`: use the `purrr::pluck` function to select a list and then the `dplyr::pull` function extract values from a specified column, e.g. `results_pm_copd |> purrr::pluck("health_main") |> dplyr::pull("impact_rounded")`

```{r, include=TRUE, eval=TRUE, echo=TRUE}
results_pm_copd$health_main
```

It is a table of the format `tibble` (very similar to a `data.frame`) of 3 rows and 23 columns. Let's zoom in on some relevant aspects

```{r}
results_pm_copd$health_main |> 
  dplyr::select(exp, bhd, rr, erf_ci, pop_fraction, impact_rounded) |> 
  knitr::kable() # For formatting reasons only: prints tibble in nice layout
```

Interpretation: this table shows us that exposure was 8.85 $\mu g/m^3$, the baseline health data (`bhd_central`) was 30747 (COPD incidence in this instance). The 1st row further shows that the impact attributable to this exposure using the central relative risk (`rr_central`) estimate of 1.369 is 3502 COPD cases, or \~11% of all baseline cases.

Some of the most results columns include:

-   *impact_rounded* rounded attributable health impact/burden
-   *impact* raw impact/burden
-   *pop_fraction* population attributable fraction (PAF) or population impact fraction (PIF)

*NOTE*: The main output contains more columns that provide additional information about the assessment.



# Absolute risk

Goal: E.g. To quantify the incidence cases of high annoyance attributable to (road traffic) noise exposure.

## Refresher - Burden of disease with absolute risk

![Figure: Absolute risk approach](images/bod_ar.png){width="700"}

### Function call

```{r}
results_noise_ha <- attribute_health(
  approach_risk = "absolute_risk", # default is "relative_risk"
  exp_central = c(57.5, 62.5, 67.5, 72.5, 77.5), # mean of the exposure categories
  pop_exp = c(387500, 286000, 191800, 72200, 7700), # population exposed per exposure category
  erf_eq_central = "78.9270-3.1162*c+0.0342*c^2" # exposure-response function
)
```

The `erf_eq_central` argument can digest other types of functions (see section on user-defined ERF)

### Main results

```{r echo=FALSE}
results_noise_ha |> 
  purrr::pluck("health_main") |>
  dplyr::select(erf_eq, erf_ci, impact_rounded) |> 
  knitr::kable() # Prints tibble in a minimal layout
```

### Results per noise exposure band

```{r eval=FALSE, include=TRUE, echo=TRUE}
results_noise_ha$health_detailed$results_raw
```

```{r echo=FALSE, include=TRUE, eval=TRUE}
results_noise_ha[["health_detailed"]][["results_raw"]] |> 
  select(exp_category, exp, pop_exp, impact) |> 
  knitr::kable()
```

Alternatively, it's also possible to only assess the impacts for a single noise exposure band

```{r}
results_noise_ha <- attribute_health(
  approach_risk = "absolute_risk",
  exp_central = 57.5,
  pop_exp = 387500,
  erf_eq_central = "78.9270-3.1162*c+0.0342*c^2"
)
```

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_noise_ha$health_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_noise_ha[["health_main"]] |> 
  dplyr::select(exp_category, impact) |> 
  knitr::kable()
```

```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(results_noise_ha)
```



# Multiple geographic units

## using relative risk

Goal: e.g. To quantify the disease cases attributable to PM2.5 exposure in multiple cities using one single command.

-   Enter unique ID's as a vector (`numeric` or `character`) to the `geo_id_micro` argument (e.g. municipality names or province abbrevations)

-   Optional: aggregate unit-specific results by providing higher-level ID's (e.g. region names or country abbreviations) as a vector (`numeric` or `character`) to the `geo_id_macro` argument

Input to the other function arguments is specified as usual, either as a vector or a single values (which will be recycled to match the length of the other input vectors).

### Function call

```{r}
results_iteration <- attribute_health(
    # Names of Swiss cantons
    geo_id_micro = c("Zurich", "Basel", "Geneva", "Ticino", "Jura"),
    # Names of languages spoken in the selected Swiss cantons
    geo_id_macro = c("German","German","French","Italian","French"),
    rr_central = 1.369,
    rr_increment = 10, 
    cutoff_central = 5,
    erf_shape = "log_linear",
    exp_central = c(11, 11, 10, 8, 7),
    bhd_central = c(4000, 2500, 3000, 1500, 500)
)
```

In this example we want to aggregate the lower-level geographic units (municipalities) by the higher-level language region (`"German", "French", "Italian"`)

### Main results

The main output contains aggregated results

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_iteration$health_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_iteration[["health_main"]] |> 
  dplyr::select(geo_id_macro, impact_rounded, erf_ci, exp_ci, bhd_ci) |> 
  knitr::kable()
```

In this case `health_main` contains the cumulative / summed number of stroke cases attributable to PM2.5 exposure in the 5 geo units, which is `r sum(results_iteration[["health_main"]]$impact_rounded)` (using a relative risk of 1.369).

### Detailed results

The geo unit-specific information and results are stored under `health_detailed`\>`results_raw`

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_iteration$health_detailed$results_raw
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_iteration[["health_detailed"]][["results_raw"]] |> 
  dplyr::select(geo_id_micro, impact_rounded, geo_id_macro) |> 
  knitr::kable()
```

`health_detailed` also contains impacts obtained through all combinations of input data central, lower and upper estimates (as usual), besides the results per geo unit (not shown above)

```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(results_iteration)
```

## using absolute risk

Goal: E.g.To quantify high annoyance cases attributable to noise exposure in rural and urban areas.

### Function call

```{r}
data <- exdat_noise |> 
  ## Filter for urban and rural regions
  dplyr::filter(region == "urban" | region == "rural")
```

```{r}
results_iteration_ar <- attribute_health( 
    # Both the rural and urban areas belong to the higher-level "total" region
    geo_id_macro = "total",
    geo_id_micro = data$region,
    approach_risk = "absolute_risk",
    exp_central = data$exposure_mean,
    pop_exp = data$exposed,
    erf_eq_central = "78.9270-3.1162*c+0.0342*c^2"
)
```

*NOTE*: the length of the input vectors fed to `geo_id_micro`, `exp_central`, `pop_exp` must match and must be

(number of geo units) x (number of exposure categories) = 2 x 5 = **10**,

because we have 2 geo units (`"rural"` and `"urban"`) and 5 exposure categories.

### Main results

`health_main` contains the aggregated results (i.e. sum of impacts in rural and urban areas)

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_iteration_ar$health_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_iteration_ar[["health_main"]] |> 
  dplyr::select(geo_id_macro, impact_rounded, erf_ci, exp_ci) |> 
  knitr::kable()
```

### Detailed results

Impact by geo unit, in this case impact in the rural and in the urban area

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_iteration_ar$health_detailed$results_by_geo_id_micro
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_iteration_ar[["health_detailed"]][["results_by_geo_id_micro"]] |> 
  dplyr::select(geo_id_micro, geo_id_macro, impact) |> 
  knitr::kable()
```

```{r  eval=TRUE, include=FALSE, echo=FALSE}
rm(results_iteration_ar, data)
```

# Uncertainty

## Confidence interval

Goal: E.g. To quantify the COPD cases attributable to PM2.5 exposure taking into account uncertainty (lower and upper bound of confidence interval) in several input arguments: relative risk, exposure and baseline health data.

### Function call

```{r}
results_pm_copd <- attribute_health(
    erf_shape = "log_linear",
    rr_central = 1.369, 
    rr_lower = 1.124, # lower 95% confidence interval (CI) bound of RR
    rr_upper = 1.664, # upper 95% CI bound of RR
    rr_increment = 10, 
    exp_central = 8.85, 
    exp_lower = 8, # lower 95% CI bound of exposure
    exp_upper = 10, # upper 95% CI bound of exposure
    cutoff_central = 5,
    bhd_central = 30747, 
    bhd_lower = 28000, # lower 95% confidence interval estimate of BHD
    bhd_upper = 32000 # upper 95% confidence interval estimate of BHD
) 
```

### Detailed results

Let's inspect the detailed results:

```{r eval=FALSE, echo=TRUE, include=FALSE}
results_pm_copd$health_detailed$results_raw
```

```{r echo=FALSE}
results_pm_copd$health_detailed$results_raw |> 
  dplyr::select(erf_ci, exp_ci, bhd_ci, impact_rounded) |> 
  dplyr::slice(1:9) |> 
  knitr::kable() # Prints tibble in a minimal layout
```

Each row contains the estimated attributable cases (`impact_rounded`) obtained by the input data specified in the columns ending in "\_ci" and the other calculation pathway specifications in that row (not shown).

-   The 1st contains the estimated attributable impact when using the central estimates of relative risk, exposure and baseline health data

-   The 2nd row shows the impact when using the central estimates of the relative risk, exposure in combination with the lower estimate of the baseline health data

-   ...

*NOTE*: only `r length(1:9)` of the `r length(results_pm_copd$health_detailed$results_raw$impact)` possible combinations are displayed due to space constraints

*NOTE*: only a selection of columns are shown


## Monte Carlo simulation

Goal: E.g. To summarize uncertainty of attributable health impacts (i.e. to get a single confidence interval instead of many combinations) by using a Monte Carlo simulation.


You can do this carrying out a Monte Carlo uncertainty analysis via the `summarize_uncertainty()` function.

The outcome of the Monte Carlo analysis is added to the variable entered as the `results` argument, which is `results_pm_copd` in our case.

Two lists ("folders") are added:

-   `uncertainty_main` contains the central estimate and the corresponding 95% confidence intervals obtained through the Monte Carlo assessment

-   `uncertainty_detailed` contains all `n_sim` simulations of the Monte Carlo assessment

### Function call

```{r}
results_pm_copd_summarized <- 
  summarize_uncertainty(
    output_attribute = results_pm_copd,
    n_sim = 100
)
```

### Main results

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_copd_summarized$uncertainty_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_pm_copd_summarized$uncertainty_main |> 
  knitr::kable()
```

### Detailed results

The folder `uncertainty_detailed` contains all single simulations. Let's look at the impact of the first 10 simulations.

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_copd_summarized$uncertainty_detailed$impact_by_sim
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_pm_copd_summarized$uncertainty_detailed$impact_by_sim |> 
  knitr::kable()
```

```{r, echo=FALSE, eval=TRUE, include=FALSE}
rm(results_pm_copd_summarized)
```



# User-defined ERF

Goal: E.g. To quantify COPD cases attributable to air pollution exposure by applying a user-defined exposure-response function (ERF), such as the MR-BRT curves from Global Burden of Disease study. 


In this case, the function arguments `erf_eq_...` require a function as input, so we use an auxiliary function (`splinefun()`) to transform the points on the ERF into type `function`.
```{r}
results_pm_copd_mr_brt <- attribute_health(
  exp_central = 8.85,
  bhd_central = 30747,
  cutoff_central = 0,
  # Specify the function based on x-y point pairs that lie on the ERF
  erf_eq_central = splinefun(
    x = c(0, 5, 10, 15, 20, 25, 30, 50, 70, 90, 110),
    y = c(1.00, 1.04, 1.08, 1.12, 1.16, 1.20, 1.23, 1.35, 1.45, 1.53, 1.60),
    method = "natural")
)
```

The ERF curve created looks as follows

```{r fig.alt="ERF curve", eval=TRUE, include=TRUE, echo=FALSE}

x <- c(0, 5, 10, 15, 20, 25, 30, 50, 70, 90, 110)
y <- c(1.00, 1.04, 1.08, 1.12, 1.16, 1.20, 1.23, 1.35, 1.45, 1.53, 1.60)
spline_func <- 
  stats::splinefun(
    x = x,
    y = y,
    method = "natural")
## Generate finer x-values
x_dense <- seq(min(x), max(x), length.out = 200)
## Evaluate the spline function at these points
y_dense <- spline_func(x_dense)
## Plot
plot(x, 
     y, 
     # main = "User-defined ERF", 
     main = "", 
     xlab = "Exposure",
     ylab = "Relative risk",
     col = "blue", 
     pch = 19)
lines(x_dense, y_dense, col = "red", lwd = 2)
legend("topleft", legend = c("Original points", "Spline curve"),
       col = c("blue", "red"), pch = c(19, NA), lty = c(NA, 1), lwd = c(NA, 2))
```

Alternatively, other functions (e.g. `approxfun()`) can be used to create the ERF

# Sub-group analysis 
## by age group

Goal: e.g. To quantify health impacts attributable to air pollution in a country *by age group*.

To obtain age-group-specific results, the baseline health data (and possibly exposure) must be available by age group.

If the `age` argument was specified, age-group-specific results are available under `health_detailed` in the sub-folder `results_by_age_group`.

```{r}
results_age_group <- attribute_health(
        approach_risk = "relative_risk",
        age = c("below_65", "65_plus"),
        exp_central = c(8, 7),
        cutoff_central = c(5, 5),
        bhd_central = c(1000, 5000),
        rr_central = 1.06,
        rr_increment = 10,
        erf_shape = "log_linear"
      )
results_age_group$health_detailed$results_by_age_group |> 
  dplyr::select(age_group, impact_rounded, exp, bhd) |> 
  knitr::kable()
```


```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(results_age_group)
```


## by sex

Goal: e.g. To quantify health impacts attributable to air pollution in a country *by sex*.

To obtain sex-specific results, the baseline health data (and possibly exposure) must be entered by sex.

If the `sex` argument was specified, sex-specific results are available under `health_detailed` in the sub-folder `results_by_sex`.

```{r eval=TRUE, include=TRUE, echo=TRUE}
results_sex <- attribute_health(
        approach_risk = "relative_risk",
        sex = c("female", "male"),
        exp_central = c(8, 8),
        cutoff_central = c(5, 5),
        bhd_central = c(1000, 1100),
        rr_central = 1.06,
        rr_increment = 10,
        erf_shape = "log_linear"
      )
results_sex$health_detailed$results_by_sex |> 
  dplyr::select(sex, impact_rounded, exp, bhd) |> 
  knitr::kable()
```

```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(results_sex)
```

## by other sub-groups

Goal: e.g. To quantify health impacts attributable to air pollution in a country *by education level*.

A (random) ID vector of unique values must be entered to the argument `geo_id_micro`, to indicate that each row is an observation on its own (and not just exposure categories)

A single vector (or a data frame / tibble with multiple columns) to group the results by can be entered to the `info` argument. In this example, this will be information about the education level.

In a second step one can group the results based on one or more columns and so summarize the results by the preferred sub-groups

```{r}
info <- data.frame(
  education = rep(c("secondary", "bachelor", "master"), each = 5) # education level
)
output_attribute <- attribute_health(
  rr_central = 1.063,
  rr_increment = 10,
  erf_shape = "log_linear",
  cutoff_central =  0,
  exp_central = sample(6:10, 15, replace = TRUE),
  bhd_central = sample(100:500, 15, replace = TRUE),
  geo_id_micro = c(1:nrow(info)), # (random) ID must be entered
  info = info
)
output_stratified <- output_attribute$health_detailed$results_raw |>
  dplyr::group_by(info_column_1) |>
  dplyr::summarize(mean_impact = mean(impact)) |>
  print()
```

```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(info, output_attribute, output_stratified)
```


# YLL & deaths with life table

## YLL

Goal: E.g. To quantify the years of life lost (YLL) due to deaths from COPD attributable to PM2.5 exposure during one year.

We can use `attribute_lifetable()` combined with life table input data to determine YLL attributable to an environmental stressor.

### Function call

```{r}
results_pm_yll <- attribute_lifetable(
  year_of_analysis = 2019, 
  health_outcome = "yll",
  rr_central =  1.118, 
  rr_increment = 10,
  erf_shape = "log_linear",
  exp_central = 8.85,
  cutoff_central = 5,
  min_age = 20, # age from which population is affected by the exposure
  # Life table information
  age_group = exdat_lifetable$age_group,
  sex = exdat_lifetable$sex,
  population = exdat_lifetable$midyear_population,
  # In the life table case, BHD refers to deaths
  bhd_central = exdat_lifetable$deaths
) 
```

### Main results

Total YLL attributable to exposure (sum of sex-specific impacts)

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_yll$health_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_pm_yll$health_main |> 
  dplyr::slice_head() |> 
  dplyr::select(impact_rounded, erf_ci, exp_ci, bhd_ci) |> 
  knitr::kable()
```

Use the two arguments `approach_exposure` and `approach_newborns` to modify the life table calculation

-   `approach_exposure`

    -   `"single_year"` (default) population is exposed for a single year and the impacts of that exposure are calculated

    -   `"constant"` population is exposed every year

-   `approach_newborns`

    -   `"without_newborns"` (default) assumes that the population in the year of analysis is followed over time, without considering newborns being born

    -   `"with_newborns"` assumes that for each year after the year of analysis n babies are born, with n being equal to the (male and female) population aged 0 that is provided in the argument population

### Detailed results

Attributable YLL results

-   per year

-   per age (group)

-   per sex (if sex-specific life table data entered)

are available.

*Note*: We will inspect the results for females; male results are also available.

### Results per year

*NOTE*: only a selection of years is shown

```{r}
results_pm_yll$health_detailed$results_raw |>
  dplyr::summarize(
    .by = year, 
    impact = sum(impact, na.rm = TRUE)
  )
```

```{r}
results_pm_yll$health_detailed$results_raw |>
  dplyr::summarize(
    .by = year, 
    impact = sum(impact, na.rm = TRUE)) |>
  knitr::kable()
```



#### YLL

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_yll$health_detailed$intermediate_calculations |> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(impact_by_age_and_year) |>
  purrr::pluck(1)
```

```{r echo=FALSE}
results_pm_yll$health_detailed$intermediate_calculations |> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(impact_by_age_and_year) |>
  purrr::pluck(1) |>
  dplyr::select(age_start, age_end, impact_2019) |> 
  dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> 
  knitr::kable()
```

#### Population (baseline scenario)

Baseline scenario refers to the scenario with exposure (i.e. the scenario specified in the assessment).

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_yll$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_exposed_by_age_and_year) |>
  purrr::pluck(1)
```

```{r echo=FALSE}
results_pm_yll$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_exposed_by_age_and_year) |>
  purrr::pluck(1) |>   
  dplyr::select(age_start, midyear_population_2019, midyear_population_2020, 
                midyear_population_2021, midyear_population_2022) |>    
  dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |>  
  knitr::kable()
```

#### Population (unexposed scenario)

Impacted scenario refers to the scenario without exposure.

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_yll$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_unexposed_by_age_and_year) |>
  purrr::pluck(1)
```

```{r echo=FALSE}
results_pm_yll$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_unexposed_by_age_and_year) |>
  purrr::pluck(1) |>   
  dplyr::select(age_start, midyear_population_2019, midyear_population_2020, 
                midyear_population_2021, midyear_population_2022) |>    
  dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |>  
  knitr::kable()
```

## Deaths

Goal: E.g. To determine premature deaths from COPD attributable to PM2.5 exposure during one year.

See example on YLL for additional info on `attribute_lifetable()` calculations and its output.

### Function call

```{r}
results_pm_deaths <- attribute_lifetable(
  health_outcome = "deaths",
  year_of_analysis = 2019,
  rr_central =  1.118, 
  rr_increment = 10,
  erf_shape = "log_linear",
  exp_central = 8.85,
  cutoff_central = 5,
  min_age = 20, # age from which population is affected by the exposure   
  # Life table information
  age_group = exdat_lifetable$age_group,   
  sex = exdat_lifetable$sex,
  population = exdat_lifetable$midyear_population, 
  bhd_central = exdat_lifetable$deaths
)
```

### Main results

Total premature deaths attributable to exposure (sum of sex-specific impacts)

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_deaths$health_main$impact
```

```{r, eval=TRUE, include=TRUE, echo=FALSE}
results_pm_deaths$health_main |> 
  dplyr::slice_head() |> 
  dplyr::select(impact_rounded, erf_ci, exp_ci, bhd_ci) |> 
  knitr::kable()
```

### Detailed results

Attributable premature deaths results

-   per year (if argument `approach_exposure = "constant"`)

-   per age (group)

-   per sex (if sex-specific life table data entered)

are available

*Note*: we will inspect the results for females; male results are also available

*Note*: because we set the function argument `approach_exposure = "constant"` in the function call results are available for one year (the year of analysis)

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_deaths$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_unexposed_by_age_and_year) |>
  purrr::pluck(1)
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_pm_yll$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_unexposed_by_age_and_year) |>
  purrr::pluck(1) |>
  dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |>  
  knitr::kable()
```

```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(results_pm_deaths)
```

# YLD

Goal: E.g. To quantify the years lived with disability (YLD) attributable to air pollution exposure using disability weights.

### Function call

```{r}
results_pm_copd_yld  <- attribute_health(
  rr_central = 1.1, 
  rr_increment = 10, 
  erf_shape = "log_linear",  
  exp_central = 8.85,
  cutoff_central = 5,
  bhd_central = 1000,
  duration_central = 10,
  dw_central = 0.2
)
```

### Main results

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_copd_yld$health_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_pm_copd_yld$health_main |> 
  dplyr::select(erf_ci, impact) |> 
  knitr::kable()
```

# DALYs

Goal: To obtain the Disability-Adjusted Life Years as the sum of YLLs and YLDs.

This is possible using the function `daly()`.

### Function call

```{r}
results_daly <- daly(
     output_attribute_yll = results_pm_yll,
     output_attribute_yld = results_pm_copd_yld
)
```

### Main results

YLL, YLD & DALY

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_daly$health_main |> 
  select(impact_yll_rounded, impact_yld_rounded, impact_rounded)
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_daly$health_main |> 
  dplyr::select(impact_yll_rounded, impact_yld_rounded, impact_rounded) |> 
  knitr::kable()
```

```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(results_daly, results_pm_yll, results_pm_copd_yld)
```


# Comparison of two health scenarios

Goal: E.g. To compare the health impacts in the scenario "before intervention" vs. "after intervention".

### Function call

1.  Use `attribute_health()` to calculate burden of scenarios A & B

```{r}
scenario_A <- attribute_health(
    exp_central = 8.85,   # EXPOSURE 1
    cutoff_central = 5, 
    bhd_central = 25000,
    approach_risk = "relative_risk",
    erf_shape = "log_linear",
    rr_central = 1.118,
    rr_increment = 10)
```

```{r}
scenario_B <- attribute_health(
    exp_central = 6,     # EXPOSURE 2
    cutoff_central = 5, 
    bhd_central = 25000,
    approach_risk = "relative_risk",
    erf_shape = "log_linear",
    rr_central = 1.118,
    rr_increment = 10)
```

Alternatively, the function `attribute_mod()` can be used to modify an existing scenario, e.g. `scenario_A`

```{r}
scenario_B <- attribute_mod(
  output_attribute = scenario_A, 
  exp_central = 6
)
```

2.  Use `compare()` to compare scenarios A & B

```{r}

results_comparison <- compare(
  approach_comparison = "delta", # or "pif" (population impact fraction)
  output_attribute_scen_1 = scenario_A,
  output_attribute_scen_2 = scenario_B
)
```

The default value for the argument `approach_comparison` is `"delta"`. The alterntive is `"pif"` (population impact fraction). See the function documentation of `compare()` for more details.

### Main results

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_comparison$health_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_comparison[["health_main"]] |> 
  dplyr::select(
    impact, impact_rounded,
    impact_scen_1, impact_scen_2,
    bhd,
    dplyr::starts_with("exp_"),
    -dplyr::starts_with("exp_ci"), # remove col "exp_ci"
    dplyr::starts_with("rr_con")) |> 
  dplyr::slice_head() |> 
  knitr::kable()
```

### Detailed results

The `compare()` results contain two additional outputs in addition to those we have already seen

-   `health_detailed`

    -   `scen_1` contains results of scenario 1 (scenario A in our case)

    -   `scen_2` contains results of scenario 2 (scenario B in our case)

```{r, echo=FALSE, eval=TRUE, include=FALSE}
rm(results_comparison, scenario_A, scenario_B)
```





# Two correlated exposures 

Goal: E.g. To quantify the total health impact attributable to PM2.5 and NO2.

For this purpose, you can use the function `multiexpose()`.

*Attention*: To apply this method, the relative risks for one exposure must be adjusted for the second exposure.


### Function call

```{r}
results_pm <- attribute_health(
  erf_shape = "log_linear",
  rr_central = 1.369, 
  rr_increment = 10,
  exp_central = 8.85,
  cutoff_central = 5,
  bhd_central = 30747
) 

results_no2 <- attribute_mod(
  output_attribute = results_pm,
  exp_central = 10.9,
  rr_central = 1.031
)

results_multiplicative <- multiexpose(
  output_attribute_exp_1 = results_pm,
  output_attribute_exp_2 = results_no2,
  exp_name_1 = "pm2.5",
  exp_name_2 = "no2",
  approach_multiexposure = "multiplicative"
)
```

```{r, echo=FALSE, eval=TRUE, include=FALSE}
rm(results_no2, results_pm)
```

### Main results

```{r eval=FALSE, include=TRUE, echo=TRUE}
results_multiplicative$health_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_multiplicative$health_main |> 
  dplyr::select(impact_rounded) |> 
  knitr::kable()
```

```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(results_multiplicative)
```



# Threshold additional to cut-off

Goal: To quantify health impacts in the exposure group 55dB+ (calculation threshold) that are affected by a exposure above the effect threshold of 45 dB (cut-off).


The function arguments `erf_eq_...` require a function as input. Instead of using a `splinefun()` this can also be fulfilled by using a 'function(c)' which is of type 'function'.

```{r}
#setting up function parameters
threshold_effect <- 45
RR <- 1.055
threshold_calculation <- 55
rr_increment <- 10

# define categorical function, the ifelse condition enables the case distinction
erf_function <- function(c){
  output <- ifelse(c<threshold_calculation, 1, exp((log(RR)/rr_increment)*(c-threshold_effect)))
  return(output)
}
# attribute_health
results_catERF_different_calc_thesh <- healthiar::attribute_health(
  approach_risk = "relative_risk",
  erf_eq_central = erf_function,
  prop_pop_exp = c(300000,200000,150000,120000,100000,70000,60000)/10000000,
  exp_central = c(47,52,57,62,67,72,77),
  cutoff_central=0,
  bhd_central=50000)$health_main$impact_rounded
```

The used function is equal to
$$
f(c) =
\begin{cases}
1, & c < \text{threshold} \\
\exp\left( \frac{\log(RR)}{rr_{increment}} (c - threshold_{effect}) \right), & c \ge \text{threshold}
\end{cases}
$$

The categorical ERF curve created looks as follows

```{r fig.alt="ERF curve", eval=TRUE, include=TRUE, echo=FALSE}




# Evaluate exp_central points
x <- c(47,52,57,62,67,72,77) # central exposure values
y <- erf_function(x)

# Generate finer x-values
x_dense <- seq(min(x), max(x), length.out = 200)
# Evaluate exp_central points (finer version)
y_dense <- erf_function(x_dense)

## Plot
plot(x,
     y # main = "User-defined ERF"
     ,
     main = "",
     xlab = "Exposure",
     ylab = "Relative risk",
     col = "blue",
     pch = 19)

lines(x_dense,y_dense,col = "red",lwd = 2)

legend("topleft",
       legend = c("Original points", "Categorical ERF curve"),
       col = c("blue", "red"),
       pch = c(19, NA),
       lty = c(NA, 1),
       lwd = c(NA, 2))

```



# Economic dimension

## Monetization 

Goal: E.g. To monetize the attributable health impact of a policy that will have health benefits five years from now.

The outcome of the monetization is added to the variable entered to the `output_attribute` argument, which is `results_pm_copd` in our case.

Two folders are added:

-   `monetization_main` contains the central monetization estimate and the corresponding 95% confidence intervals obtained through the specified monetization

-   `monetization_detailed` contains the monetized results for each unique combination of the input variable estimates that were provided to the initial `attribute_health()` call

### Function call

```{r}
monetized_pm_copd <- monetize(
    output_attribute = results_pm_copd,
    discount_shape = "exponential",
    discount_rate = 0.03,
    n_years = 5,
    valuation = 50000 # E.g. EURO
)
```




### Main results

```{r eval=FALSE, include=FALSE, echo=TRUE}
monetized_pm_copd$monetization_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
monetized_pm_copd$monetization_main |> 
  select(erf_ci, monetized_impact) |> 
  knitr::kable()
```

We see that the monetized impact (discounted) is more than 160 million EURO.

Alternatively, you can also monetize (attributable) health impacts from a non-`healthiar` source.

```{r}
results <- monetize(
  impact = 1151,
  valuation = 100
)
```

## Cost-benefit analysis

Goal: E.g. To estimate the net-benefit of a health policy via cost-benefit analysis (CBA).

Let's imagine we design a policy that would reduce air pollution to 5 $\mu g/m^3$, which is the concentration specified in the `cutoff_central` argument in the initial `attribute_health()` call. So we could avoid all COPD cases attributed to air pollution.

Considering the cost to implement the policy (estimated at 100 million EURO), what would be the monetary net benefit of such a policy, ? We can find out using `healthiar`'s `cba()` function.

The outcome of the CBA is contained in two folders, which are added to the existing assessment:

-   `cba_main` contains the central estimate and the corresponding 95% confidence intervals obtained

-   `cba_detailed` contains additional intermediate results for both cost and benefit

    -   `benefit` contains results `by_year` and raw results `health_raw`

    -   `cost` contains the costs of the policy at the end of the period specified in the `n_years_cost` argument

### Function call

```{r}
cba <- cba(
    output_attribute = results_pm_copd,
    valuation = 50000,
    cost = 100000000,
    discount_shape = "exponential",
    discount_rate_benefit = 0.03,
    discount_rate_cost = 0.03,
    n_years_benefit = 5,
    n_years_cost = 5
)
```

### Main results

```{r}
cba$cba_main |>  
  dplyr::select(benefit, cost, net_benefit) |> 
  knitr::kable()
```

```{r, echo=FALSE, eval=TRUE, include=FALSE}
rm(cba)
```

We see that the central and upper 95% confidence interval estimates of avoided attributable COPD cases result in a net monetary benefit of the policy, while the lower 95% confidence interval estimate results in a net cost!




# Multiple deprivation index

Goal: E.g. To estimate the multiple deprivation index (MDI) to use it for the argument `social_indicator` in the function `socialize()`.

### Function call

```{r eval=TRUE, include=TRUE, echo=TRUE}
mdi <- prepare_mdi(
  geo_id_micro = exdat_prepare_mdi$id,
  edu = exdat_prepare_mdi$edu,
  unemployed = exdat_prepare_mdi$unemployed,
  single_parent = exdat_prepare_mdi$single_parent,
  pop_change = exdat_prepare_mdi$pop_change,
  no_heating = exdat_prepare_mdi$no_heating,
  n_quantile = 10,
  verbose = FALSE
)
```

*Note*: `verbose = FALSE` suppresses any output to the console (default: `verbose = TRUE`, i.e. with printing turned on).

### Main results

Function output includes

-   `mdi_main`, a tibble containing the BEST-COST MDI

```{r eval=FALSE, include=TRUE, echo=TRUE}
mdi$mdi_main |> 
  select(geo_id_micro, MDI, MDI_index)
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
mdi$mdi_main |> 
  dplyr::select(geo_id_micro, MDI, MDI_index) |> 
  knitr::kable()
```

### Detailed results

-   `mdi_detailed`

    -   DESCRIPTIVE STATISTICS

    -   PEARSON'S CORRELATION COEFFICIENTS

    -   CRONBACH'S α, including the reliability rating this value indicates

    -   Code for boxplots of the single indicators

    -   Code for histogram of the MDI's for the geo units with a normal distribution curve

To reproduce the boxlots run
```{r fig.alt="Boxplot of Normalized Indicators and MDI", eval=TRUE, include=TRUE, echo=TRUE}
eval(mdi$mdi_detailed$boxplot)
```
Analogeously, to reproduce the histogram run
```{r fig.alt="Histogram of MDI with normal curve", eval=TRUE, include=TRUE, echo=TRUE}
eval(mdi$mdi_detailed$histogram)
```


```{r, eval=TRUE, include=FALSE, echo=FALSE}
rm(mdi)
```
---------------------------------------------------------------------------

# Export and visualize

Exporting and visualizing results is out of scope of `healthiar`. To export and visualize, you can make use of existing functions in other packages beyond `healthiar` as indicated below.

## Export results

Export as `.csv` file

```{r eval=FALSE, include=FALSE, echo=TRUE}
write.csv(x = results_pm_copd$health_main, file = "exported_results/results_pm_copd.csv")
```

Save as `.Rdata` file

```{r eval=FALSE, include=FALSE, echo=TRUE}
save(results_pm_copd, file = "exported_results/results_pm_copd.Rdata")
```

Export to Excel (as `.xlsx` file)

```{r eval=FALSE, include=FALSE, echo=TRUE}
openxlsx::write.xlsx(x = results_pm_copd$health_main, file = "exported_results/results_pm_copd.xlsx")
```

## Visualize results

Visualization is out of scope of `healthiar`. You can visualize in

-   R, e.g. with the `ggplot2` package ([online book by the creator](https://ggplot2-book.org/))
-   Excel (export results first)
-   Other tools

------------------------------------------------------------------------

# Abbreviations

BHD/bhd = baseline health data 

CI = confidence interval

CBA/cba = cost-benefit analysis

exp = exposure

ERF = exposure-response function

RR/rr = relative risk

YLL/yll = years of life lost