This document outlines the One4All package and highlights the main functions to validate data without using the validator app. It is the user’s choice whether to work in the validator app or to use the One4All package. After reading this document, users will have a better understanding of the One4All package development and the main functions to validate and share data. To access the One4All package, go to our GitHub and link it directly to your own device in R.
The One4All package is the backbone of the validator app. If you are looking for a tutorial on how to use the app see the Validator App Tutorial.
After installing the R package, read in the following library:
To run the app, run the command run_app()
.
The function below validates data using the One4All package. Replace
the four parameters defined below with your actual values or file paths.
The 'data_names'
should be replaced with the tables from
the rules sheet.
'files_data'
: A list of file paths for
the datasets to be validated (either CSV or XLSX files).
'data_names'
: (Optional) A character
vector of names for the datasets. If not provided, names will be
extracted from the file paths.
'(ex. methodology, samples, particles)'
'file_rules'
: A file path for the rules
file, either in CSV or XLSX format.
'zip_data'
: A file path to a zip folder
for validating unstructured data.
The function below checks for malicious files. If any of the provided
files appear to have a malicious extension, the function will stop and
raise an error. The argument, 'files'
, is a character
vector of file paths, which can be paths to zip or individual files. If
any malicious file is found, the code will return ‘TRUE’, otherwise it
will say ‘FALSE’.
The function below reads rules from a file or a data frame. Acceptable file formats are CSV or XLSX files.
The 'remote_download'
function from the One4All package
allows users to download shared data from MongoDB, CKAN, and/or
AmazonS3. The data is retrieved based on the 'hashed_data'
identifier and assumes the data is stored using the same naming
conventions provided in the 'remote_share'
function.
The 'remote_download'
function is shown below.
downloaded_data <- remote_download(hashed_data = "example_hash",
ckan_url = "https://example.com",
ckan_key = "your_ckan_key",
ckan_package = "your_ckan_package",
s3_key_id = "your_s3_key_id",
s3_secret_key = "your_s3_secret_key",
s3_region = "your_s3_region",
s3_bucket = "your_s3_bucket",
mongo_key = "mongo_key",
mongo_collection = "mongo_collection")