| Title: | Search Data Frames for Personally Identifiable Information | 
| Version: | 1.3.0 | 
| Maintainer: | Jacob Patterson-Stein <jacobpstein@gmail.com> | 
| Description: | Check a data frame for personal information, including names, location, disability status, and geo-coordinates. | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| Depends: | R (≥ 2.10), dplyr, stringr, uuid, utils | 
| RoxygenNote: | 7.3.2 | 
| Suggests: | testthat (≥ 3.0.0) | 
| Config/testthat/edition: | 3 | 
| URL: | https://github.com/jacobpstein/pii | 
| BugReports: | https://github.com/jacobpstein/pii/issues | 
| NeedsCompilation: | no | 
| Packaged: | 2025-01-11 19:55:50 UTC; jacobpstein | 
| Author: | Jacob Patterson-Stein [aut, cre] | 
| Repository: | CRAN | 
| Date/Publication: | 2025-01-13 15:40:06 UTC | 
Search Data Frames for Personally Identifiable Information
Description
Search Data Frames for Personally Identifiable Information
Usage
check_PII(df)
Arguments
| df | a data frame object | 
Value
Returns a data frame of columns that potentially contain PII
Examples
# create a data frame containing various personally identifiable information
pii_df <- data.frame(
 lat = c(40.7128, 34.0522, 41.8781),
 long = c(-74.0060, -118.2437, -87.6298),
 first_name = c("John", "Michael", "Linda"),
 phone = c("123-456-7890", "234-567-8901", "345-678-9012"),
 age = sample(30:60, 3, replace = TRUE),
 email = c("test@example.com", "contact@domain.com", "user@website.org"),
 disabled = c("No", "Yes", "No"),
 stringsAsFactors = FALSE
)
check_PII(pii_df)
Split Data Into PII and Non-PII Columns
Description
Split Data Into PII and Non-PII Columns
Usage
split_PII_data(df, exclude_columns = NULL)
Arguments
| df | a data frame object | 
| exclude_columns | columns to exclude from the data frame splitdescription | 
Value
Returns two data frames into the global environment: one containing the PII columns and one without the PII columns. A unique merge key is created to join them. The function then prints the columns that were flagged and split to the console.
Examples
# create a data frame containing various personally identifiable information
pii_df <- data.frame(
 lat = c(40.7128, 34.0522, 41.8781),
 long = c(-74.0060, -118.2437, -87.6298),
 first_name = c("John", "Michael", "Linda"),
 phone = c("123-456-7890", "234-567-8901", "345-678-9012"),
 age = sample(30:60, 3, replace = TRUE),
 email = c("test@example.com", "contact@domain.com", "user@website.org"),
 disabled = c("No", "Yes", "No"),
 stringsAsFactors = FALSE
)
split_PII_data(pii_df, exclude_columns = c("phone"))