| Title: | Compare Data Frames | 
| Version: | 0.3.0 | 
| Description: | A toolset for interactively exploring the differences between two data frames. | 
| License: | MIT + file LICENSE | 
| Suggests: | testthat (≥ 3.0.0) | 
| Config/testthat/edition: | 3 | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.3 | 
| Imports: | rlang (≥ 1.1.0), cli, dplyr (≥ 1.1.0), glue, tidyselect (≥ 1.2.0), vctrs (≥ 0.6.4), tibble, pillar, purrr, collapse (≥ 2.0.9), data.table | 
| URL: | https://eutwt.github.io/versus/, https://github.com/eutwt/versus | 
| BugReports: | https://github.com/eutwt/versus/issues | 
| Depends: | R (≥ 4.1.0) | 
| LazyData: | true | 
| Config/Needs/website: | rmarkdown | 
| NeedsCompilation: | yes | 
| Packaged: | 2024-01-12 00:13:02 UTC; mbp | 
| Author: | Ryan Dickerson [aut, cre, cph] | 
| Maintainer: | Ryan Dickerson <fresh.tent5866@fastmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2024-01-12 00:30:02 UTC | 
versus: Compare Data Frames
Description
Compare two tables
Author(s)
Maintainer: Ryan Dickerson fresh.tent5866@fastmail.com [copyright holder]
See Also
Useful links:
- Report bugs at https://github.com/eutwt/versus/issues 
Compare two data frames
Description
compare() creates a representation of the differences between two tables,
along with a shallow copy of the tables. This output is used
as the comparison argument when exploring the differences further with other
versus functions e.g. slice_*() and weave_*().
Usage
compare(table_a, table_b, by, allow_both_NA = TRUE, coerce = TRUE)
Arguments
| table_a | A data frame | 
| table_b | A data frame | 
| by | < | 
| allow_both_NA | Logical. If  | 
| coerce | Logical. If  | 
Value
- compare()
- A list of data frames having the following elements: - tables
- 
A data frame with one row per input table showing the number of rows and columns in each. 
- by
- 
A data frame with one row per bycolumn showing the class of the column in each of the input tables.
- intersection
- 
A data frame with one row per column common to table_aandtable_band columns "n_diffs" showing the number of values which are different between the two tables, "class_a"/"class_b" the class of the column in each table, and "value_diffs" a (nested) data frame showing the the values in each table which are unequal and thebycolumns
- unmatched_cols
- 
A data frame with one row per column which is in one input table but not the other and columns "table": which table the column appears in, "column": the name of the column, and "class": the class of the column. 
- unmatched_rows
- 
A data frame which, for each row present in one input table but not the other, contains the column "table" showing which table the row appears in and the bycolumns for that row.
 
data.table inputs
If the input is a data.table, you may want compare() to make a deep copy instead
of a shallow copy so that future changes to the table don't affect the comparison.
To achieve this, you can set options(versus.copy_data_table = TRUE).
Examples
compare(example_df_a, example_df_b, by = car)
Modified version of datasets::mtcars - version a
Description
A version of mtcars with some values altered and some rows/columns removed. Not for informational purposes, used only to demonstrate the comparison of two slightly different data frames. Since some values were altered at random, the values do not necessarily reflect the true original values. The variables are as follows:
Usage
example_df_a
Format
A data frame with 9 rows and 9 variables:
- car
- The rowname in the corresponding - datasets::mtcarsrow
- mpg
- Miles/(US) gallon 
- cyl
- Number of cylinders 
- disp
- Displacement (cu.in.) 
- hp
- Gross horsepower 
- drat
- Rear axle ratio 
- wt
- Weight (1000 lbs) 
- vs
- Engine (0 = V-shaped, 1 = straight) 
- am
- Transmission (0 = automatic, 1 = manual) 
Source
Sourced from the CRAN datasets package, with modified values. Originally from Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.
Modified version of datasets::mtcars - version b
Description
A version of mtcars with some values altered and some rows/columns removed. Not for informational purposes, used only to demonstrate the comparison of two slightly different data frames. Since some values were altered at random, the values do not necessarily reflect the true original values. The variables are as follows:
Usage
example_df_b
Format
A data frame with 9 rows and 9 variables:
- car
- The rowname in the corresponding - datasets::mtcarsrow
- wt
- Weight (1000 lbs) 
- mpg
- Miles/(US) gallon 
- hp
- Gross horsepower 
- cyl
- Number of cylinders 
- disp
- Displacement (cu.in.) 
- carb
- Number of carburetors 
- drat
- Rear axle ratio 
- vs
- Engine (0 = V-shaped, 1 = straight) 
Source
Sourced from the CRAN datasets package, with modified values. Originally from Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.
Get rows with differing values
Description
Get rows with differing values
Usage
slice_diffs(comparison, table, column = everything())
Arguments
| comparison | The output of  | 
| table | One of  | 
| column | < | 
Value
The input table is filtered to the rows for which comparison
shows differing values for one of the columns selected by column
Examples
comp <- compare(example_df_a, example_df_b, by = car)
comp |> slice_diffs("a", mpg)
comp |> slice_diffs("b", mpg)
comp |> slice_diffs("a", c(mpg, disp))
Get rows in only one table
Description
Get rows in only one table
Usage
slice_unmatched(comparison, table)
slice_unmatched_both(comparison)
Arguments
| comparison | The output of  | 
| table | One of  | 
Value
| slice_unmatched() | The table identified by  | 
| slice_unmatched_both() | The output of  | 
Examples
comp <- compare(example_df_a, example_df_b, by = car)
comp |> slice_unmatched("a")
comp |> slice_unmatched("b")
# slice_unmatched(comp, "a") output is the same as
example_df_a |> dplyr::anti_join(example_df_b, by = comp$by$column)
comp |> slice_unmatched_both()
Get the differing values from a comparison
Description
Get the differing values from a comparison
Usage
value_diffs(comparison, column)
value_diffs_stacked(comparison, column = everything())
Arguments
| comparison | The output of  | 
| column | < | 
Value
| value_diffs() | A data frame with one row for each element
of  | 
| value_diffs_stacked(),value_diffs_all() | A data frame containing
the  | 
Examples
comp <- compare(example_df_a, example_df_b, by = car)
value_diffs(comp, disp)
value_diffs_stacked(comp, c(disp, mpg))
Argument type: tidy-select
Description
This page describes the <tidy-select> argument modifier which
indicates that the argument uses tidy selection, a sub-type of
tidy evaluation. If you've never heard of tidy evaluation before,
start with the practical introduction in
https://r4ds.hadley.nz/functions.html#data-frame-functions then
then read more about the underlying theory in
https://rlang.r-lib.org/reference/topic-data-mask.html.
Overview of selection features
tidyselect implements a DSL for selecting variables. It provides helpers for selecting variables:
-  var1:var10: variables lying betweenvar1on the left andvar10on the right.
-  starts_with("a"): names that start with"a".
-  ends_with("z"): names that end with"z".
-  contains("b"): names that contain"b".
-  matches("x.y"): names that match regular expressionx.y.
-  num_range(x, 1:4): names following the pattern,x1,x2, ...,x4.
-  all_of(vars)/any_of(vars): matches names stored in the character vectorvars.all_of(vars)will error if the variables aren't present;any_of(var)will match just the variables that exist.
-  everything(): all variables.
-  last_col(): furthest column on the right.
-  where(is.numeric): all variables whereis.numeric()returnsTRUE.
As well as operators for combining those selections:
-  !selection: only variables that don't matchselection.
-  selection1 & selection2: only variables included in bothselection1andselection2.
-  selection1 | selection2: all variables that match eitherselection1orselection2.
Key techniques
- If you want the user to supply a tidyselect specification in a function argument, you need to tunnel the selection through the function argument. This is done by embracing the function argument - {{ }}, e.g- unnest(df, {{ vars }}).
- If you have a character vector of column names, use - all_of()or- any_of(), depending on whether or not you want unknown variable names to cause an error, e.g- unnest(df, all_of(vars)),- unnest(df, !any_of(vars)).
- To suppress - R CMD check- NOTEs about unknown variables use- "var"instead of- var:
# has NOTE
df %>% select(x, y, z)
# no NOTE
df %>% select("x", "y", "z")
Get differences in context
Description
Get differences in context
Usage
weave_diffs_long(comparison, column = everything())
weave_diffs_wide(comparison, column = everything())
Arguments
| comparison | The output of  | 
| column | < | 
Value
| weave_diffs_wide() | The input  | 
| weave_diffs_long() | Input tables are filtered to rows where
differing values exist for one of the columns selected by  | 
Examples
comp <- compare(example_df_a, example_df_b, by = car)
comp |> weave_diffs_wide(disp)
comp |> weave_diffs_wide(c(mpg, disp))
comp |> weave_diffs_long(disp)
comp |> weave_diffs_long(c(mpg, disp))