library(joyn)
#> 
#> Attaching package: 'joyn'
#> The following object is masked from 'package:base':
#> 
#>     merge
library(data.table)
#> Warning: package 'data.table' was built under R version 4.4.2
x <- data.table(id = c(1, 4, 2, 3, NA),
                t  = c(1L, 2L, 1L, 2L, NA),
                country = c(16, 12, 3, NA, 15))
  
y <- data.table(id  = c(1, 2, 5, 6, 3),
                gdp = c(11L, 15L, 20L, 13L, 10L),
                country = 16:20)This vignette will let you explore some additional features available
in joyn, through an example use case.
Suppose you want to join tables x and y,
where the variable country is available in both. You could do
one of five things:
If you don’t use the argument by, joyn will
consider country and id as key variables by default
given that they are common between x and
y.
# The variables with the same name, `id` and `country`, are used as key
# variables.
joyn(x = x, 
     y = y)
#> 
#> ── JOYn Report ──
#> 
#>   .joyn n percent
#> 1     x 4   44.4%
#> 2     y 4   44.4%
#> 3 x & y 1   11.1%
#> 4 total 9    100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id and country from id, gdp, and country
#>       id     t country   gdp  .joyn
#>    <num> <int>   <num> <int> <fctr>
#> 1:     1     1      16    11  x & y
#> 2:     4     2      12    NA      x
#> 3:     2     1       3    NA      x
#> 4:     3     2      NA    NA      x
#> 5:    NA    NA      15    NA      x
#> 6:     2    NA      17    15      y
#> 7:     5    NA      18    20      y
#> 8:     6    NA      19    13      y
#> 9:     3    NA      20    10      yAlternatively, you can specify to join by country
# Joining by country
joyn(x = x, 
     y = y, 
     by = "country")
#> 
#> ── JOYn Report ──
#> 
#>   .joyn n percent
#> 1     x 4   44.4%
#> 2     y 4   44.4%
#> 3 x & y 1   11.1%
#> 4 total 9    100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables country from id, gdp, and country
#>       id     t country   gdp  .joyn
#>    <num> <int>   <num> <int> <fctr>
#> 1:     1     1      16    11  x & y
#> 2:     4     2      12    NA      x
#> 3:     2     1       3    NA      x
#> 4:     3     2      NA    NA      x
#> 5:    NA    NA      15    NA      x
#> 6:    NA    NA      17    15      y
#> 7:    NA    NA      18    20      y
#> 8:    NA    NA      19    13      y
#> 9:    NA    NA      20    10      yy and
don’t bring it into the resulting tableThis the default if you did not include country as part of
the key variables in argument by.
joyn(x = x, 
     y = y, 
     by = "id")
#> 
#> ── JOYn Report ──
#> 
#>   .joyn n percent
#> 1     x 2   28.6%
#> 2     y 2   28.6%
#> 3 x & y 3   42.9%
#> 4 total 7    100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id, gdp, and country
#>       id     t country   gdp  .joyn
#>    <num> <int>   <num> <int> <fctr>
#> 1:     1     1      16    11  x & y
#> 2:     4     2      12    NA      x
#> 3:     2     1       3    15  x & y
#> 4:     3     2      NA    10  x & y
#> 5:    NA    NA      15    NA      x
#> 6:     5    NA      NA    20      y
#> 7:     6    NA      NA    13      yAnother possibility is to make use of the update_NAs
argument of joyn(). This allows you to update the NAs
values in variable country in table x with the
actual values of the matching observations in country from
table y. In this case, actual values in country from table x
will remain unchanged.
joyn(x = x,
     y = y, 
     by = "id", 
     update_NAs = TRUE)
#> 
#> ── JOYn Report ──
#> 
#>         .joyn     n percent
#>        <char> <int>  <char>
#> 1:          x     2   28.6%
#> 2:      x & y     2   28.6%
#> 3: NA updated     3   42.9%
#> 4:      total     7    100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id, gdp, and country
#>       id     t country   gdp      .joyn
#>    <num> <int>   <num> <int>     <fctr>
#> 1:     1     1      16    11      x & y
#> 2:     4     2      12    NA          x
#> 3:     2     1       3    15      x & y
#> 4:     3     2      20    10 NA updated
#> 5:    NA    NA      15    NA          x
#> 6:     5    NA      18    20 NA updated
#> 7:     6    NA      19    13 NA updatedYou can also update all the values - both NAs and actual - in
variable country of table x with the actual values
of the matching observations in country from y.
This is done by setting update_values = TRUE.
Notice that the reportvar allows you keep track of how
the update worked. In this case, value update means that only
the values that are different between country from
x and country from y are updated.
However, let’s consider other possible cases:
If, for the same matching observations, the values between the two country variables were the same, the reporting variable would report x & y instead (so you know that there is no update to make).
if there are NAs in country from y, the
actual values in x will be unchanged, and you would see a
not updated status in the reporting variable. Nevertheless,
notice there is another way for you to bring country from
y to x. This is done through the argument
keep_y_in_x (see 5. below ⬇️)
# Notice that only the value that are 
joyn(x = x, 
     y = y, 
     by = "id", 
     update_values = TRUE)
#> 
#> ── JOYn Report ──
#> 
#>            .joyn     n percent
#>           <char> <int>  <char>
#> 1:    NA updated     3   42.9%
#> 2: value updated     2   28.6%
#> 3:   not updated     2   28.6%
#> 4:         total     7    100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id, gdp, and country
#>       id     t country   gdp         .joyn
#>    <num> <int>   <num> <int>        <fctr>
#> 1:     1     1      16    11 value updated
#> 2:     4     2      12    NA   not updated
#> 3:     2     1      17    15 value updated
#> 4:     3     2      20    10    NA updated
#> 5:    NA    NA      15    NA   not updated
#> 6:     5    NA      18    20    NA updated
#> 7:     6    NA      19    13    NA updatedAnother available option is that of bringing the original variable
country from y into the resulting table, without
using it to update the values in x. In order to distinguish
country from x and country from
y, joyn will assign a suffix to the variable’s
name: so that you will get country.y and country.x.
All of this can be done specifying
keep_common_vars = TRUE.
joyn(x = x, 
     y = y, 
     by = "id", 
     keep_common_vars = TRUE)
#> 
#> ── JOYn Report ──
#> 
#>   .joyn n percent
#> 1     x 2   28.6%
#> 2     y 2   28.6%
#> 3 x & y 3   42.9%
#> 4 total 7    100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#> ℹ Note: Removing key variables id from id, gdp, and country
#>       id     t country.x   gdp country.y  .joyn
#>    <num> <int>     <num> <int>     <int> <fctr>
#> 1:     1     1        16    11        16  x & y
#> 2:     4     2        12    NA        NA      x
#> 3:     2     1         3    15        17  x & y
#> 4:     3     2        NA    10        20  x & y
#> 5:    NA    NA        15    NA        NA      x
#> 6:     5    NA        NA    20        18      y
#> 7:     6    NA        NA    13        19      yIn joyn , you can also bring non common variables from
y into the resulting table. In fact you can specify them in
y_vars_to_keep, as shown in the example below:
# Keeping variable gdp 
joyn(x = x, 
     y = y, 
     by = "id", 
     y_vars_to_keep = "gdp")
#> 
#> ── JOYn Report ──
#> 
#>   .joyn n percent
#> 1     x 2   28.6%
#> 2     y 2   28.6%
#> 3 x & y 3   42.9%
#> 4 total 7    100%
#> ────────────────────────────────────────────────────────── End of JOYn report ──
#> ℹ Note: Joyn's report available in variable .joyn
#>       id     t country   gdp  .joyn
#>    <num> <int>   <num> <int> <fctr>
#> 1:     1     1      16    11  x & y
#> 2:     4     2      12    NA      x
#> 3:     2     1       3    15  x & y
#> 4:     3     2      NA    10  x & y
#> 5:    NA    NA      15    NA      x
#> 6:     5    NA      NA    20      y
#> 7:     6    NA      NA    13      yNotice that if you set y_vars_to_keep = FALSE or
y_vars_to_keep = NULL, then joyn won’t bring
any variable into the returning table.