unpivotr deals with non-tabular data, especially from spreadsheets. Use unpivotr when your source data has any of these ‘features’:
If that list makes your blood boil, you’ll enjoy the function names.
behead() deals with multi-headered hydra tables one
layer of headers at a time, working from the edge of the table inwards.
It’s a bit like using header = TRUE in
read.csv(), but because it’s a function, you can apply it
to as many layers of headers as you need. You end up with all the
headers in columns.spatter() is like tidyr::spread() but
preserves mixed data types. You get into a mixed-data-type situation by
delaying type coercion until after the table is tidy (rather
than before, like read.csv() et al). And yes, it usually
follows behead().More positive, corrective functions:
justify() aligns column headers before
behead()ing, and has deliberate moral overtones.enhead() attaches a header to the body of the data,
a la Frankenstein. The effect is the same as
behead(), but is more powerful because you can choose
exactly which header cells you want, paying attention to formatting
(which behead() doesn’t understand).isolate_sentinels() separates meaningful symbols like
"N/A" or "confidential" from the rest of the
data, giving them some time alone think about what they’ve done.partition() takes a sheet with several tables on it,
and slashes into pieces that each contain one table. You can then
unpivot each table in turn with purrr::map() or
similar.Unpivotr uses data where each cells is represented by one row in a dataframe. Like this.
 
What can you do with tidy cells? The best places to start are:
Otherwise the basic idea is:
devtools::install_github("tidyverse/readr#760").unpivotr::tidy_html()unpivotr::as_cells() – this should
be a last resort, because by the time the data is in a conventional data
frame, it is often too late – formatting has been lost, and most data
types have been coerced to strings.behead() straight away, else
dplyr::filter() separately for the header cells and the
data cells, and then recombine with enhead().spatter() so that each column has one data type.library(unpivotr)
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr     1.1.4     ✔ readr     2.1.5
#> ✔ forcats   1.0.0     ✔ stringr   1.5.1
#> ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
#> ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
#> ✔ purrr     1.0.2     
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ✖ tidyr::pack()   masks unpivotr::pack()
#> ✖ tidyr::unpack() masks unpivotr::unpack()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
x <- purpose$`up-left left-up`
x # A pivot table in a conventional data frame.  Four levels of headers, in two
#>                            X2      X3     X4     X5    X6     X7
#> 1                        <NA>    <NA> Female   <NA>  Male   <NA>
#> 2                        <NA>    <NA>  0 - 6 7 - 10 0 - 6 7 - 10
#> 3           Bachelor's degree 15 - 24   7000  27000  <NA>  13000
#> 4                        <NA> 25 - 44  12000 137000  9000  81000
#> 5                        <NA> 45 - 64  10000  64000  7000  66000
#> 6                        <NA>     65+   <NA>  18000  7000  17000
#> 7                 Certificate 15 - 24  29000 161000 30000 190000
#> 8                        <NA> 25 - 44  34000 179000 31000 219000
#> 9                        <NA> 45 - 64  30000 210000 23000 199000
#> 10                       <NA>     65+  12000  77000  8000 107000
#> 11                    Diploma 15 - 24   <NA>  14000  9000  11000
#> 12                       <NA> 25 - 44  10000  66000  8000  47000
#> 13                       <NA> 45 - 64   6000  68000  5000  58000
#> 14                       <NA>     65+   5000  41000  1000  34000
#> 15           No Qualification 15 - 24  10000  43000 12000  37000
#> 16                       <NA> 25 - 44  11000  36000 21000  50000
#> 17                       <NA> 45 - 64  19000  91000 17000  75000
#> 18                       <NA>     65+  16000 118000  9000  66000
#> 19 Postgraduate qualification 15 - 24   <NA>   6000  <NA>   <NA>
#> 20                       <NA> 25 - 44   5000  86000  7000  60000
#> 21                       <NA> 45 - 64   6000  55000  6000  68000
#> 22                       <NA>     65+   <NA>  13000  <NA>  18000
  # rows and two columns.
y <- as_cells(x) # 'Tokenize' or 'melt' the data frame into one row per cell
y
#> # A tibble: 132 × 4
#>      row   col data_type chr              
#>    <int> <int> <chr>     <chr>            
#>  1     1     1 chr       <NA>             
#>  2     2     1 chr       <NA>             
#>  3     3     1 chr       Bachelor's degree
#>  4     4     1 chr       <NA>             
#>  5     5     1 chr       <NA>             
#>  6     6     1 chr       <NA>             
#>  7     7     1 chr       Certificate      
#>  8     8     1 chr       <NA>             
#>  9     9     1 chr       <NA>             
#> 10    10     1 chr       <NA>             
#> # ℹ 122 more rows
rectify(y) # useful for reviewing the melted form as though in a spreadsheet
#> # A tibble: 22 × 7
#>    `row/col` `1(A)`            `2(B)`  `3(C)` `4(D)` `5(E)` `6(F)`
#>        <int> <chr>             <chr>   <chr>  <chr>  <chr>  <chr> 
#>  1         1 <NA>              <NA>    Female <NA>   Male   <NA>  
#>  2         2 <NA>              <NA>    0 - 6  7 - 10 0 - 6  7 - 10
#>  3         3 Bachelor's degree 15 - 24 7000   27000  <NA>   13000 
#>  4         4 <NA>              25 - 44 12000  137000 9000   81000 
#>  5         5 <NA>              45 - 64 10000  64000  7000   66000 
#>  6         6 <NA>              65+     <NA>   18000  7000   17000 
#>  7         7 Certificate       15 - 24 29000  161000 30000  190000
#>  8         8 <NA>              25 - 44 34000  179000 31000  219000
#>  9         9 <NA>              45 - 64 30000  210000 23000  199000
#> 10        10 <NA>              65+     12000  77000  8000   107000
#> # ℹ 12 more rows
y %>%
  behead("up-left", "sex") %>%               # Strip headers
  behead("up", "life-satisfication") %>%  # one
  behead("left-up", "qualification") %>%     # by
  behead("left", "age-band") %>%            # one.
  select(-row, -col, -data_type, count = chr) %>% # cleanup
  mutate(count = as.integer(count))
#> # A tibble: 80 × 5
#>     count sex    `life-satisfication` qualification     `age-band`
#>     <int> <chr>  <chr>                <chr>             <chr>     
#>  1   7000 Female 0 - 6                Bachelor's degree 15 - 24   
#>  2  12000 Female 0 - 6                Bachelor's degree 25 - 44   
#>  3  10000 Female 0 - 6                Bachelor's degree 45 - 64   
#>  4     NA Female 0 - 6                Bachelor's degree 65+       
#>  5  27000 Female 7 - 10               Bachelor's degree 15 - 24   
#>  6 137000 Female 7 - 10               Bachelor's degree 25 - 44   
#>  7  64000 Female 7 - 10               Bachelor's degree 45 - 64   
#>  8  18000 Female 7 - 10               Bachelor's degree 65+       
#>  9     NA Male   0 - 6                Bachelor's degree 15 - 24   
#> 10   9000 Male   0 - 6                Bachelor's degree 25 - 44   
#> # ℹ 70 more rowsNote the compass directions in the code above, which hint to
behead() where to find the header cell for each data
cell.
"up-left" means the header (Female,
Male) is positioned up and to the left of the columns of
data cells it describes."up" means the header (0 - 6,
7 - 10) is positioned directly above the columns of data
cells it describes."left-up" means the header
(Bachelor's degree, Certificate, etc.) is
positioned to the left and upwards of the rows of data cells it
describes."left" means the header (15 - 24,
25 - 44, etc.) is positioned directly to the left of the
rows of data cells it describes.# install.packages("devtools") # If you don't already have devtools
devtools::install_github("nacnudus/unpivotr", build_vignettes = TRUE)The version 0.4.0 release had somee breaking changes. See
NEWS.md for details. The previous version can be installed
as follow:
devtools::install_version("unpivotr", version = "0.3.1", repos = "http://cran.us.r-project.org")unpivotr is inspired by Databaker, a collaboration between the United Kingdom Office of National Statistics and The Sensible Code Company. unpivotr.
jailbreaker attempts to extract non-tabular data from spreadsheets into tabular structures automatically via some clever algorithms. unpivotr differs by being less magic, and equipping you to express what you want to do.