--- title: "Introduction to the `hicp`-package" author: "Sebastian Weinand" #date: "`r Sys.setlocale('LC_TIME', 'English'); format(Sys.Date(),'%d %B %Y')`" date: "February 2026" output: rmarkdown::html_vignette: toc: true # number_sections: true bibliography: references.bib vignette: > %\VignetteIndexEntry{Introduction to the hicp-package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} Sys.setenv(LANGUAGE="en") # set cores for testing on CRAN via devtools::check_rhub() library(restatapi) options(restatapi_cores=1) # load additional packages: library(data.table) options(datatable.print.nrows=10) options(datatable.print.topn=5) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ```

The Harmonised Index of Consumer Prices (HICP) is the key economic figure to measure inflation in the euro area. The methodology underlying the HICP is documented in the HICP Methodological Manual [@Eurostat2024]. Based on this manual, the `hicp`-package provides functions for data users to work with publicly available HICP price indices and weights (*upper-level aggregation*). This vignette highlights the main package features. It contains four sections on global package options, data access, the classification of individual consumption by purpose (COICOP) underlying the HICP, as well as index aggregation, change rates and contributions of lower-level indices to the overall inflation rate. It also shows how the package functions can be similarly applied to quarterly index series like the owner-occupied housing price index (OOHPI). # Package options The package works with several global options controlling the function behavior. Most importantly, `options("hicp.coicop.version")` defines the COICOP version to be used. Several versions are supported. The HICP uses the European COICOP version 2, which is the package's default. Since none of the COICOP versions contains a code for the all-items index, `options("hicp.all.items.code")` allows to define this code. If the COICOP codes include a certain prefix, this prefix can be set by `options("hicp.coicop.prefix")`. At package start-up, the following options are set as default. ```{r setup, message=FALSE} # load package: library(hicp) # set global options: options(hicp.coicop.version="ecoicop2.hicp") # COICOP version to be used options(hicp.coicop.prefix="CP") # prefix of COICOP codes options(hicp.all.items.code="TOTAL") # internal code for the all-items index options(hicp.chatty=TRUE) # print package-specific messages and warnings ``` # HICP data The `hicp`-package offers easy access to HICP data from Eurostat's public [database](https://ec.europa.eu/eurostat/data/database). For that purpose, it uses the download functionality provided by Eurostat's [`restatapi`](https://CRAN.R-project.org/package=restatapi)-package. This section shows how to list, filter and retrieve HICP data using the functions `datasets()`, `datafilters()`, and `data()`. ## Available data sets Eurostat's database contains various data sets of different statistics. All data sets are classified by topic and can be accessed via a navigation tree. HICP data can be found under "Economy and finance / Prices". An even simpler solution that does not require visiting Eurostat's database is provided by the function `datasets()`, which lists all available HICP data sets with corresponding metadata (e.g., number of observations, last update). ```{r echo=FALSE} load(file.path("data", "hicp_datasets.RData")) ``` ```{r eval=FALSE} # download table of available HICP data sets: dtd <- datasets() ``` The function output shows the first five HICP data sets. As can be seen, a short description of each data set and some metadata are provided. The variable `code` is the data set identifier, which is needed to filter and download data. ```{r warning=FALSE} dtd[1:5, list(title, code, lastUpdate, values)] ``` ## Allowed data filters The HICP is compiled each month in each member state of the European Union (EU) for various items. Its compilation started in 1996. Therefore, the data set of price indices is relatively large. Sometimes, however, data users only need the price indices of certain years or specific countries. Eurostat's API and, thus, the `restatapi`-package allows to provide filters on each data request, e.g., to download only the price indices of the euro area for the all-items HICP. The filtering options can differ for each data set. The function `datafilters()` returns the allowed filtering options for a given data set. ```{r echo=FALSE} load(file.path("data", "hicp_datafilters.RData")) ``` ```{r eval=FALSE} # download allowed filters for data set 'prc_hicp_inw': dtf <- datafilters(id="prc_hicp_iw") ``` The function output shows that the data set `prc_hicp_iw` for the HICP item weights can be filtered with respect to the frequency (`freq`), the COICOP code (`coicop18`), the statistical unit (`statinfo`) and the geographical area (`geo`). The table `dtf` contains for each filter the allowed values, e.g., `CP011` for `coicop18` and `A` for `freq`. These filters can be integrated in the data download as explained in the following subsection. ```{r warning=FALSE} # allowed filters: unique(dtf$concept) # allowed filter values: dtf[1:5,] ``` ## Data download Applying a filter to a data request can noticeably reduce the downloading time, particularly for bigger data sets. The function `data()` can be used to download a specific data set without any filters ```{r eval=FALSE} # download all available item weights: hicp::data(id="prc_hicp_iw", flags=TRUE) ``` or with filters on the time dimension and other filtering options: ```{r echo=FALSE} load(file.path("data", "hicp_itemweights.RData")) ``` ```{r eval=FALSE} # download item weights with filters: item.weights <- hicp::data(id="prc_hicp_iw", filters=list("geo"=c("EA","DE","FR")), date.range=c("2019","2025"), flags=TRUE) ``` The downloaded object `item.weights` contains `r nrow(item.weights)` HICP item weights for the euro area, Germany, and France from 2019 to 2025. ```{r warning=FALSE} item.weights[1:5, ] ``` # HICP and COICOP HICP item weights and price indices are classified according to the European COICOP version 2 (ECOICOP2-HICP). At the lowest level of subclasses (5-digit codes), there is the finest differentiation of items by consumption purpose, e.g., *cereals (01111)* or *bread and bakery products (01113)*. Both subclasses belong to the same class, *cereal and cereal products (0111)*, and, at higher levels, to the same group *food (011)* and division *food and non-alcoholic beverages (01)*. Hence, COICOP and thus the aggregation of the HICP follows a pre-defined hierarchical tree. This section shows how to work with the COICOP codes and the HICP special aggregates whose definition is based on COICOP codes. ## COICOP codes and their relatives In general, COICOP codes consist of numbers. Using the function `is.coicop()`, it can be easily checked if a code is a valid COICOP code or not. This validation is based on the selected COICOP version in `options("hicp.coicop.version")`. It further considers any prefix of the COICOP codes defined in `options("hicp.coicop.prefix")`. For the COICOP codes from Eurostat's database, the prefix `CP` is expected. The code ``r getOption("hicp.all.items.code")`` is used in this package for the all-items HICP although it is not considered a valid COICOP code. ```{r warning=FALSE} # all-items code and codes without prefix "CP" are no valid ECOICOP codes: is.coicop(id=c("TOTAL","CP01","CP011","CP012","012")) # games of chance are not valid in ECOICOP-HICP ver. 1: is.coicop("CP0943", settings=list(coicop.version="ecoicop1.hicp")) # but in ECOICOP-HICP ver. 2: is.coicop("CP0943", settings=list(coicop.version="ecoicop2.hicp")) ``` For the aggregation of HICP data from bottom to top, the children and parents of each COICOP code must be properly derived. Children are those codes that belong to the same higher-level code (or parent). Such relations can be direct (e.g., `01->011`) or indirect (e.g., `01->0111`). The functions `child()` and `parent()` allow to derive all relatives of a COICOP code. ```{r warning=FALSE} # get parents: parent(id=c("CP01","CP011","CP01111","CP01112"), usedict=TRUE) # get children: child(id=c("CP01","CP011","CP01111","CP01112"), usedict=TRUE) ``` If the 4-digit or 5-digit level is not available for some divisions in the HICP data, it is not possible to derive the all-items HICP only from the 5-digit level. In this case, the item weights would not add up to 1000. Instead, the missing 4-digit and 5-digit codes must be replaced with their higher-level parents. The function `tree()` allows to derive this composition of COICOP codes at the lowest possible level. This can be particularly useful if one wants to aggregate the price indices at the lowest level in a single step into the all-items index (see also next section). ```{r warning=FALSE} # example codes: ids <- c("CP01","CP011","CP012","CP0111","CP0112") # derive COICOP tree from top to bottom: tree(ids) # still same tree because weights add up: tree(id=ids, w=c(0.2,0.08,0.12,0.05,0.03)) # now (CP011,CP012) because weights do not correctly add up at lower levels: tree(id=ids, w=c(0.2,0.08,0.12,0.05,0.01)) ``` ## Special aggregates For the HICP, various special aggregates like food and energy are calculated. Each special aggregate is composed of a selection of COICOP codes. This composition is fix over time but depends on the COICOP version. The function `spec.agg()` provides the definitions of all HICP special aggregates, while the function `is.spec.agg()` validates the codes of special aggregates. ```{r warning=FALSE} # validate codes: is.spec.agg(id=c("TOTAL","CP01","FOOD","NRG")) # get compositions of non-processed food and energy: spec.agg(id=c("FOOD_NP","NRG")) ``` # Index aggregation, rates of change, and contributions The HICP is a chain-linked Laspeyres-type index [@EU2016]. The (unchained) price indices in each calendar year refer to December of the previous year, which is the *price reference period*. These price indices are chain-linked to the existing index using December to obtain the HICP. The HICP indices currently refer to the *index reference period* 2025=100. Monthly and annual change rates can be derived from the price indices. The contributions of the price changes of individual items to the annual rate of change can be computed by the "Ribe method". More details can be found in @Eurostat2024[, chapter 8]. ## Index aggregation The all-items index is a weighted average of the items' subindices. However, because the HICP is a chain index, the subindices cannot simply be aggregated. They first need to be unchained, i.e., expressed relative to December of the previous year. These unchained indices can then be aggregated as a weighted average. Since the Laspeyres-type index is *consistent in aggregation*, the aggregation can be done gradually from the bottom level to the top or directly in one step. In the following example, the euro area HICP is computed directly in one step and also gradually through all higher-level indices. First, the monthly price indices are downloaded from Eurostat's database for the index reference period 2025=100 (`unit`) and the period from December 2019 to December 2025. ```{r echo=FALSE} load(file.path("data", "hicp_prices.RData")) ``` ```{r eval=FALSE} # download monthly price indices: dtp <- hicp::data(id="prc_hicp_minr", filters=list(unit="I25", geo="EA"), date.range=c("2019-12", "2025-12")) ``` ```{r warning=FALSE} # convert into proper dates: dtp[, "time":=as.Date(paste0(time, "-01"))] dtp[, "year":=as.integer(format(time, "%Y"))] setnames(x=dtp, old="values", new="index") ``` Second, the price indices are unchained separately for each ECOICOP using the function `unchain()`. ```{r warning=FALSE} # unchain price indices: dtp[, "dec_ratio" := unchain(x=index, t=time), by="coicop18"] ``` Next, the price indices `prc` and item weights `inw` are merged into one data set. ```{r warning=FALSE} # manipulate item weights: dtw <- item.weights[geo=="EA", list(coicop18,geo,time,values)] dtw[, "time":=as.integer(time)] setnames(x=dtw, old=c("time","values"), new=c("year","weight")) # merge price indices and item weights: dtall <- merge(x=dtp, y=dtw, by=c("geo","coicop18","year"), all.x=TRUE) ``` For aggregating the unchained price indices in one step into the all-items index, the lowest level of the COICOP tree must be derived. Based on the derived COICOP tree, the unchained price indices are aggregated using the function `laspeyres()`, chained into a long-term index series using the function `chain()`, and finally re-referenced to the index reference period 2025 using the function `rebase()`. The resulting index is plotted below. ```{r warning=FALSE, fig.width=7, fig.align="center"} # derive COICOP tree for index aggregation: dtall[weight>0 & !is.na(dec_ratio), "tree" := tree(id=coicop18, w=weight, flag=TRUE, settings=list(w.tol=0.1)), by="time"] # compute all-items HICP in one aggregation step: hicp.own <- dtall[tree==TRUE, list("laspey"=laspeyres(x=dec_ratio, w0=weight)), by="time"] setorderv(x=hicp.own, cols="time") # chain the resulting index: hicp.own[, "chain_laspey" := chain(x=laspey, t=time, by=12)] # rebase the index to 2025: hicp.own[, "chain_laspey_25" := rebase(x=chain_laspey, t=time, t.ref="2025")] # plot all-items index: plot(chain_laspey_25~time, data=hicp.own, type="l", xlab="Time", ylab="Index") title("Euro area HICP") abline(h=0, lty="dashed") ``` Similarly, the (unchained) price indices are aggregated gradually following the COICOP tree, which produces in addition to the all-items index all lower-level indices. ```{r warning=FALSE} # compute all-items HICP gradually from bottom to top: hicp.own.all <- dtall[weight>0 & !is.na(dec_ratio), aggregate.tree(x=dec_ratio, w0=weight, id=coicop18, formula=laspeyres), by="time"] setorderv(x=hicp.own.all, cols="time") hicp.own.all[, "chain_laspey" := chain(x=laspeyres, t=time, by=12), by="id"] hicp.own.all[, "chain_laspey_25" := rebase(x=chain_laspey, t=time, t.ref="2025"), by="id"] ``` A comparison to the all-items index that has been computed in one step shows no differences, which highlights the consistency in aggregation of the Laspeyres-type index. ```{r warning=FALSE} # all-items HICP from direct and gradual aggregation identical: all(abs(hicp.own.all[id=="TOTAL", chain_laspey_25]-hicp.own$chain_laspey_25)<0.1) ``` User-defined aggregates can be easily calculated with the functions `aggregate()` and `disaggregate()`. This is particularly useful for the calculation of the HICP special aggregates like food, energy or the overall index excluding the two as shown below. ```{r warning=FALSE} # compute food and energy by aggregation: dtall[time>="2019-12-01", aggregate(x=dec_ratio, w0=weight, id=coicop18, agg=spec.agg(id=c("FOOD","NRG")), settings=list(exact=FALSE, names=c("FOOD","NRG"))), by="time"] # compute overall index excluding food and energy by disaggregation: dtall[time>="2019-12-01", disaggregate(x=dec_ratio, w0=weight, id=coicop18, agg=list("TOTAL"=c("FOOD","NRG")), settings=list(names="TOT_X_FOOD_NRG")), by="time"] ``` The resulting aggregates can finally be chained and rebased as shown before. User-defined functions can be passed to `aggregate()` as well, which allows aggregation using various weighted or unweighted bilateral index formulas. By contrast, the function `disaggregate()` requires the underlying data to be aggregated as a Laspeyres-type index. ## Rates of change and contributions Monthly change rates are computed by dividing the HICP index in the current period by the index one month before. Annual change rates are derived by comparing the index in the current month to the index in the same month one year before. Both rates can be easily derived using the function `rates()`. Contributions of the price changes of individual items to the overall annual rate of change can be computed by the Ribe method as implemented in the function `contrib()`. ```{r warning=FALSE, fig.width=7, fig.height=4, fig.align="center"} # compute annual rates of change for the all-items HICP: dtall[, "ar" := rates(x=index, t=time, type="year"), by=c("geo","coicop18")] # add all-items HICP: dtall <- merge(x=dtall, y=dtall[coicop18=="TOTAL", list(geo,time,index,weight)], by=c("geo","time"), all.x=TRUE, suffixes=c("","_all")) # Ribe decomposition: dtall[, "ribe" := contrib(x=index, w=weight, t=time, x.all=index_all, w.all=weight_all, type="year"), by="coicop18"] # annual change rates and contribtuions over time: plot(ar~time, data=dtall[coicop18=="TOTAL",], type="l", xlab="Time", ylab="", ylim=c(-1,13)) lines(ribe~time, data=dtall[coicop18=="CP011"], col="red") title("Contributions of food to overall inflation") legend("topleft", col=c("red","black"), lty=1, bty="n", legend=c("Contributions of food (in pp-points)", "Overall inflation (in %)")) ``` ## Quarterly index series Most of the calculations shown in the previous two sections can be similarly applied to quarterly (or annual) index series. The owner-occupied housing price index (OOHPI) is a prominent example for a chained quarterly Laspeyres-type price index. The OOHPI indices and weights can be obtained from Eurostat's database. Below, they are downloaded for the period from 2014 to 2024 for the euro area. ```{r echo=FALSE} load(file.path("data", "ooh_prices.RData")) load(file.path("data", "ooh_itemweights.RData")) ``` ```{r eval=FALSE} # download quarterly OOHPI for euro area: dtp <- hicp::data(id="prc_hpi_ooq", filters=list(unit="I15_Q", geo="EA"), date.range=c("2014-10","2024-12")) # download annual OOH weights for euro area: dtw <- hicp::data(id="prc_hpi_ooinw", filters=list(geo="EA"), date.range=c("2014","2024")) ``` Before calculations can start, any time variables in the data must be put first into proper dates. Afterwards, the indices and weights can be merged into a single data set. ```{r warning=FALSE} # manipulate indices: dtp[, c("year","quarter") := tstrsplit(x=time, split="-Q", fixed=TRUE)] dtp[, "year":=as.integer(year)] dtp[, "quarter":=as.integer(quarter)] dtp[, "time":=as.Date(paste(year, quarter*3, "01", sep="-"), format="%Y-%m-%d")] dtp[, c("unit","quarter"):=NULL] setnames(x=dtp, old="values", new="index") # manipulate item weights: dtw[, "year":=as.integer(time)] dtw[, c("unit","time"):=NULL] setnames(x=dtw, old="values", new="weight") # merge indices and item weights: dtooh <- merge(x=dtp, y=dtw, by=c("geo","expend","year"), all.x=TRUE) setcolorder(x=dtooh, neworder=c("geo","expend","year","time")) setkeyv(x=dtooh, cols=c("geo","expend","time")) ``` The OOHPI is chained using the fourth quarter of the previous year. Hence, for the aggregation of the OOHPI subcomponents, the indices must first be unchained using the function `unchain()`. The argument `by` of this function should now match to one month of the relevant quarter. Hence, for the fourth quarter, `by` should be set to `10`, `11` or `12`. The unchaining then works as usual. ```{r} # unchain indices: dtooh[, "ratio" := unchain(x=index, t=time, by=12L), by="expend"] ``` The subcomponents of the OOHPI do not follow the COICOP system. Instead, they are classified into expenditure categories (`expend`). These must be (manually) selected for index aggregation. For example, the total OOHPI is an aggregate of the two categories 'acquisition of dwellings' (`DW_ACQ`) and 'ownership of dwellings' (`DW_OWN`). These two expenditure categories are further broken down into finer ones. In the following, they are used to compute the overall OOHPI, which is finally chained and rebased to the year 2015. ```{r} # aggregate, chain and rebase: dtagg <- dtooh[expend%in%c("DW_ACQ","DW_OWN"), list("oohpi"=laspeyres(x=ratio, w0=weight)), by="time"] dtagg[, "oohpi" := chain(x=oohpi, t=time)] dtagg[, "oohpi" := rebase(x=oohpi, t=time, t.ref="2015")] ``` It is important to note that the functions `unchain()`, `chain()` and `rebase()` auto-detect the frequency of the time series. If users prefer to manually define the frequency, the function settings can be changed to `settings=list(freq="quarter")`. The same is true for the derivation of annual (or quarterly) change rates: ```{r} # derive annual change rates: dtagg[, "ar" := rates(x=oohpi, t=time, type="year", settings=list(freq="quarter"))] ``` The annual change rates `ar` show the percentage change of the overall OOHPI in the current quarter compared to the same quarter one year before. These change rates could be further decomposed into the individual contributions of each expenditure category using the function `contrib()`. # References