| Title: | Stabilising Variable Selection | 
| Version: | 1.0.6 | 
| Description: | A stable approach to variable selection through stability selection and the use of a permutation-based objective stability threshold. Lima et al (2021) <doi:10.1038/s41598-020-79317-8>, Meinshausen and Buhlmann (2010) <doi:10.1111/j.1467-9868.2010.00740.x>. | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| RoxygenNote: | 7.1.1 | 
| Config/testthat/edition: | 3 | 
| Depends: | R (≥ 3.0.0) | 
| Suggests: | rmarkdown, testthat (≥ 3.0.0), markdown | 
| Imports: | glmnet, dplyr, bigstep, rsample, tibble, purrr, tidyr, stringr, ggplot2, broom, caret, ncvreg, knitr, Hmisc, expss, lme4, matrixStats, recipes, lmerTest | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | no | 
| Packaged: | 2023-05-17 09:39:37 UTC; svzrh2 | 
| Author: | Robert Hyde | 
| Maintainer: | Robert Hyde <robert.hyde4@nottingham.ac.uk> | 
| Repository: | CRAN | 
| Date/Publication: | 2023-05-17 11:00:05 UTC | 
stabiliser
Description
This package uses bootstrap resampling and an objective selection stability threshold to provide a robust method of selecting variables truly associated with an outcome.
Author(s)
Robert Hyde robert.hyde4@nottingham.ac.uk
Martin Green
Eliana Lima
boot_model
Description
Function to calculate stability of variables' association with an outcome for a given model over a number of bootstrap repeats
Arguments
| data | a dataframe containing an outcome variable to be permuted | 
| outcome | the outcome as a string (i.e. "y") | 
| boot_reps | the number of bootstrap samples | 
| model | the model to be used (i.e. model_mbic) | 
model_enet
Description
Function to model elastic net selection process on a given dataframe
Arguments
| data | a dataframe containing an outcome variable to be permuted (usually coming from nested bootstrap data) | 
| outcome | the outcome as a string (i.e. "y") | 
| type | model type, either "linear" or "logistic" | 
model_lasso
Description
Function to model lasso selection process on a given dataframe
Arguments
| data | a dataframe containing an outcome variable to be permuted (usually coming from nested bootstrap data) | 
| outcome | the outcome as a string (i.e. "y") | 
| type | model type, either "linear" or "logistic" | 
model_mbic
Description
Function to model mbic selection process on a given dataframe
Arguments
| data | a dataframe containing an outcome variable to be permuted (usually coming from nested bootstrap data) | 
| outcome | the outcome as a string (i.e. "y") | 
| type | model type, either "linear" or "logistic" | 
model_mcp
Description
Function to model mcp selection process on a given dataframe
Arguments
| data | a dataframe containing an outcome variable to be permuted (usually coming from nested bootstrap data) | 
| outcome | the outcome as a string (i.e. "y") | 
| type | model type, either "linear" or "logistic" | 
model_selector
Description
Determines which models to call.
perm_stab
Description
Main function to call both permutation and bootstrapping functions; to be looped over multiple models selected by the user.
permute
Description
Calculates permutation threshold for null model, where a specified model is run over multiple bootstrap resamples of multiple permuted version of the dataset.
Arguments
| data | a dataframe containing an outcome variable to be permuted | 
| outcome | the outcome to be permuted as a string (i.e. "y") | 
| permutations | the number of times to be permuted per repeat | 
| perm_boot_reps | the number of times to repeat each set of permutations | 
| quantile | The quantile of null stabilities to use as a threshold. | 
rep_selector_boot
Description
wrapper function to determine the number of bootstrap repeats
Usage
rep_selector_boot(data, boot_reps)
Arguments
| data | the dataset to analyse. | 
| boot_reps | the number of bootstrap samples | 
rep_selector_boot
Description
wrapper function to determine the number of permutations
Usage
rep_selector_perm(data, permutations)
Arguments
| data | the dataset to analyse. | 
| permutations | the number of times to be permuted per repeat | 
selection_bias_inner
Description
An function to illustrate the risk of selection bias in conventional modelling approaches by simulating a dataset with no information and conducting conventional modelling with prefiltration.
Arguments
| nrows | The number of rows to simulate. | 
| ncols | The number of columns to simulate. | 
| p_thresh | The p-value threshold to use in univariate pre-filtration. | 
Value
A list including a dataframe of results, a dataframe of the median number of variables selected and a plot illustrating false positive selection.
simulate_data
Description
Simulate a dataset. This can optionally include variables with a given associated with the outcome.
Usage
simulate_data(nrows, ncols, n_true = 0, amplitude = 0)
Arguments
| nrows | The number of rows to simulate. | 
| ncols | The number of columns to simulate. | 
| n_true | The number of variables truly associated with the outcome. | 
| amplitude | The strength of association between true variables and the outcome. | 
Value
A simulated dataset
simulate_data_re
Description
Simulate a 500x500 dataset with 8 true fixed effects, 492 junk variables and a clustered outcome suitable for a 2 level random effects analysis. The strength of association between true variables and the outcome is governed by the error added at level 1 (defined by parameter sd_level_1) and level 2 (sd_level_2).
Arguments
| sd_level_1 | Standard deviation of level 1 variables | 
| sd_level_2 | Standard deviation of level 2 variables | 
Value
A simulated dataset with a clustered outcome sutable for random effects analysis
simulate_selection_bias
Description
An function to illustrate the risk of selection bias in conventional modelling approaches by simulating a dataset with no information and conducting conventional modelling with prefiltration.
Arguments
| nrows | A vector of the number of rows to simulate (i.e., c(100, 200)). | 
| ncols | A vector of the number of columns to simulate (i.e., c(100, 200)). | 
| p_thresh | A vector of the p-value threshold to use in univariate pre-filtration (i.e., c(0.1, 0.2)). | 
Value
A list including a dataframe of results, a dataframe of the median number of variables selected and a plot illustrating false positive selection.
stab_plot
Description
Plot from stability object
Arguments
| stabiliser_outcome | Outcome from stabilise() or triangulate() function. | 
Value
A ggplot object.
stabilise
Description
Function to calculate stability of variables' association with an outcome for a given model over a number of bootstrap repeats
Arguments
| data | A dataframe containing an outcome variable to be permuted. | 
| outcome | The outcome as a string (i.e. "y"). | 
| boot_reps | The number of bootstrap samples. Default is "auto" which selects number based on dataframe size. | 
| permutations | The number of times to be permuted per repeat. Default is "auto" which selects number based on dataframe size. | 
| perm_boot_reps | The number of times to repeat each set of permutations. Default is 20. | 
| models | The models to select for stabilising. Default is elastic net (models = c("enet")), other available models include "lasso", "mbic", "mcp". | 
| type | The type of model, either "linear" or "logistic" | 
| quantile | The quantile of null stabilities to use as a threshold. | 
| normalise | Normalise numeric variables (TRUE/FALSE) | 
| dummy | Create dummy variables for factors/characters (TRUE/FALSE) | 
| impute | Impute missing data (TRUE/FALSE) | 
Value
A list for each model selected. Each list contains a dataframe of variable stabilities, a numeric permutation threshold, and a dataframe of coefficients for both bootstrap and permutation.
stabilise_re
Description
Function to calculate stability of variables' association with an outcome for a given model over a number of bootstrap repeats using clustered data.
Arguments
| data | A dataframe containing an outcome variable to be permuted. | 
| outcome | The outcome as a string (i.e. "y"). | 
| level_2_id | The variable name determining level 2 status as a string (i.e., "level_2_column_name"). | 
| n_top_filter | The number of variables to filter for final model (Default = 50). | 
| boot_reps | The number of bootstrap samples. Default is "auto" which selects number based on dataframe size. | 
| permutations | The number of times to be permuted per repeat. Default is "auto" which selects number based on dataframe size. | 
| perm_boot_reps | The number of times to repeat each set of permutations. Default is 20. | 
| normalise | Normalise numeric variables (TRUE/FALSE) | 
| dummy | Create dummy variables for factors/characters (TRUE/FALSE) | 
| impute | Impute missing data (TRUE/FALSE) | 
Value
A list containing a table of variable stabilities and a numeric permutation threshold.
stabiliser_example
Description
A simulated dataset
Usage
stabiliser_example
Format
A data frame with 50 rows and 100 variables.
The stabiliser_example dataset is a simulated example with the following properties:
1 simulated outcome variable: y
4 variables simulated to be associated with y: causal1, causal2...
95 variables simulated to have no association with y: junk1, junk2...
stabiliser_prep
Description
Prepares dataset using recipes framework
Arguments
| normalise | Normalise numeric variables (TRUE/FALSE) | 
| dummy | Create dummy variables for factors/characters (TRUE/FALSE) | 
| impute | Impute missing data (TRUE/FALSE) | 
triangulate
Description
Triangulate multiple models using a stability object
Arguments
| object | An object generated through the stabilise() function. | 
| quantile | The quantile of null stabilities to use as a threshold. | 
Value
A combined list of model results including a dataframe of stability results for variables and a numeric permutation threshold.