| Type: | Package | 
| Title: | Best Split Selection Modeling for Low-Dimensional Data | 
| Version: | 1.0.3 | 
| Date: | 2021-11-08 | 
| Author: | Anthony Christidis <anthony.christidis@stat.ubc.ca>, Stefan Van Aelst <stefan.vanaelst@kuleuven.be>, Ruben Zamar <ruben@stat.ubc.ca> | 
| Maintainer: | Anthony Christidis <anthony.christidis@stat.ubc.ca> | 
| Description: | Functions to generate or sample from all possible splits of features or variables into a number of specified groups. Also computes the best split selection estimator (for low-dimensional data) as defined in Christidis, Van Aelst and Zamar (2019) <doi:10.48550/arXiv.1812.05678>. | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| Biarch: | true | 
| Imports: | multicool, glmnet, parallel, doParallel, foreach | 
| RoxygenNote: | 7.1.1 | 
| Suggests: | testthat, mvnfast | 
| NeedsCompilation: | no | 
| Packaged: | 2021-11-09 04:02:22 UTC; antho | 
| Repository: | CRAN | 
| Date/Publication: | 2021-11-09 06:00:02 UTC | 
Coefficients for splitSelect object
Description
coef.cv.splitSelect returns the coefficients for a cv.splitSelect for new data.
Usage
## S3 method for class 'cv.splitSelect'
coef(object, optimal.only = TRUE, ...)
Arguments
| object | An object of class cv.splitSelect. | 
| optimal.only | A boolean variable (TRUE default) to indicate if only the coefficient of the optimal split are returned. | 
| ... | Additional arguments for compatibility. | 
Value
A matrix with the coefficients of the cv.splitSelect object.
Author(s)
Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca
See Also
Examples
# Setting the parameters
p <- 4
n <- 30
n.test <- 5000
beta <- rep(5,4)
rho <- 0.1
r <- 0.9
SNR <- 3
# Creating the target matrix with "kernel" set to rho
target_cor <- function(r, p){
  Gamma <- diag(p)
  for(i in 1:(p-1)){
    for(j in (i+1):p){
      Gamma[i,j] <- Gamma[j,i] <- r^(abs(i-j))
    }
  }
  return(Gamma)
}
# AR Correlation Structure
Sigma.r <- target_cor(r, p)
Sigma.rho <- target_cor(rho, p)
sigma.epsilon <- as.numeric(sqrt((t(beta) %*% Sigma.rho %*% beta)/SNR))
# Simulate some data
x.train <- mvnfast::rmvn(30, mu=rep(0,p), sigma=Sigma.r)
y.train <- 1 + x.train %*% beta + rnorm(n=n, mean=0, sd=sigma.epsilon)
# Generating the coefficients for a fixed split
split.out <- cv.splitSelect(x.train, y.train, G=2, use.all=TRUE,
                            fix.partition=list(matrix(c(2,2), 
                                               ncol=2, byrow=TRUE)), 
                            fix.split=NULL,
                            intercept=TRUE, group.model="glmnet", alphas=0, nfolds=10)
coef(split.out)
Coefficients for splitSelect object
Description
coef.splitSelect returns the coefficients for a splitSelect object.
Usage
## S3 method for class 'splitSelect'
coef(object, ...)
Arguments
| object | An object of class splitSelect. | 
| ... | Additional arguments for compatibility. | 
Value
A matrix with the coefficients of the splitSelect object.
Author(s)
Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca
See Also
Examples
# Setting the parameters
p <- 4
n <- 30
n.test <- 5000
beta <- rep(5,4)
rho <- 0.1
r <- 0.9
SNR <- 3
# Creating the target matrix with "kernel" set to rho
target_cor <- function(r, p){
  Gamma <- diag(p)
  for(i in 1:(p-1)){
    for(j in (i+1):p){
      Gamma[i,j] <- Gamma[j,i] <- r^(abs(i-j))
    }
  }
  return(Gamma)
}
# AR Correlation Structure
Sigma.r <- target_cor(r, p)
Sigma.rho <- target_cor(rho, p)
sigma.epsilon <- as.numeric(sqrt((t(beta) %*% Sigma.rho %*% beta)/SNR))
# Simulate some data
x.train <- mvnfast::rmvn(30, mu=rep(0,p), sigma=Sigma.r)
y.train <- 1 + x.train %*% beta + rnorm(n=n, mean=0, sd=sigma.epsilon)
# Generating the coefficients for a fixed partition of the variables
split.out <- splitSelect(x.train, y.train, G=2, use.all=TRUE,
                         fix.partition=list(matrix(c(2,2), ncol=2, byrow=TRUE)), fix.split=NULL,
                         intercept=TRUE, group.model="glmnet", alphas=0)     
coef(split.out)
Split Selection Modeling for Low-Dimensional Data - Cross-Validation
Description
cv.splitSelect performs the best split selection algorithm with cross-validation
Usage
cv.splitSelect(
  x,
  y,
  intercept = TRUE,
  G,
  use.all = TRUE,
  family = c("gaussian", "binomial")[1],
  group.model = c("glmnet", "LS", "Logistic")[1],
  alphas = 0,
  nsample = NULL,
  fix.partition = NULL,
  fix.split = NULL,
  nfolds = 10,
  parallel = FALSE,
  cores = getOption("mc.cores", 2L)
)
Arguments
| x | Design matrix. | 
| y | Response vector. | 
| intercept | Boolean variable to determine if there is intercept (default is TRUE) or not. | 
| G | Number of groups into which the variables are split. Can have more than one value. | 
| use.all | Boolean variable to determine if all variables must be used (default is TRUE). | 
| family | Description of the error distribution and link function to be used for the model. Must be one of "gaussian" or "binomial". | 
| group.model | Model used for the groups. Must be one of "glmnet" or "LS". | 
| alphas | Elastic net mixing parameter. Should be between 0 (default) and 1. | 
| nsample | Number of sample splits for each value of G. If NULL, then all splits will be considered (unless there is overflow). | 
| fix.partition | Optional list with G elements indicating the partitions (in each row) to be considered for the splits. | 
| fix.split | Optional matrix with p columns indicating the groups (in each row) to be considered for the splits. | 
| nfolds | Number of folds for the cross-validation procedure. | 
| parallel | Boolean variable to determine if parallelization of the function. Default is FALSE. | 
| cores | Number of cores for the parallelization for the function. | 
Value
An object of class cv.splitSelect.
Author(s)
Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca
See Also
coef.cv.splitSelect, predict.cv.splitSelect
Examples
# Setting the parameters
p <- 4
n <- 30
n.test <- 5000
beta <- rep(5,4)
rho <- 0.1
r <- 0.9
SNR <- 3
# Creating the target matrix with "kernel" set to rho
target_cor <- function(r, p){
  Gamma <- diag(p)
  for(i in 1:(p-1)){
    for(j in (i+1):p){
      Gamma[i,j] <- Gamma[j,i] <- r^(abs(i-j))
    }
  }
  return(Gamma)
}
# AR Correlation Structure
Sigma.r <- target_cor(r, p)
Sigma.rho <- target_cor(rho, p)
sigma.epsilon <- as.numeric(sqrt((t(beta) %*% Sigma.rho %*% beta)/SNR))
# Simulate some data
x.train <- mvnfast::rmvn(30, mu=rep(0,p), sigma=Sigma.r)
y.train <- 1 + x.train %*% beta + rnorm(n=n, mean=0, sd=sigma.epsilon)
# Generating the coefficients for a fixed partition of the variables
split.out <- cv.splitSelect(x.train, y.train, G=2, use.all=TRUE,
                            fix.partition=list(matrix(c(2,2), 
                                               ncol=2, byrow=TRUE)), 
                            fix.split=NULL,
                            intercept=TRUE, group.model="glmnet", alphas=0, nfolds=10)
Generate Splits Partitions Possibilities
Description
generate_partitions returns a matrix with the number of possible objects in each group using splits.
Usage
generate_partitions(p, G, use.all = TRUE)
Arguments
| p | Number of variables or objects to split. | 
| G | Number of groups into which the variables are split. | 
| use.all | Boolean variable to determine if all variables must be used (default is TRUE). | 
Value
A matrix or list with the number of possible objects in each group using splits.
Author(s)
Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca
Examples
# Generating the possible split partitions of 6 variables in 3 groups
# Using all the variables
split.3groups.all <- generate_partitions(6, 3)
split.3groups.all
# Without using all the variables
split.3groups <- generate_partitions(6, 3, use.all=FALSE)
split.3groups
Generate Splits Possibilities
Description
generate_splits returns a matrix with the different splits of the variables in reach row.
Usage
generate_splits(p, G, use.all = TRUE, fix.partition = NULL, verbose = TRUE)
Arguments
| p | Number of variables or objects to split. | 
| G | Number of groups into which the variables are split. | 
| use.all | Boolean variable to determine if all variables must be used (default is TRUE). | 
| fix.partition | Optional matrix with G columns (or list if more than one value of G) indicating the partitions (in each row) to be considered for the splits. | 
| verbose | Boolean variable to determine if console output for cross-validation progress is printed (default is TRUE). | 
Value
A matrix with the different splits of the variables in the groups.
Author(s)
Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca
Examples
# Generating the possible splits of 6 variables in 3 groups
# Using all the variables
split.3groups.all <- generate_splits(6, 3)
split.3groups.all
# Without using all the variables
split.3groups <- generate_splits(6, 3, use.all=FALSE)
split.3groups
Compute Total Number of Possible Splits
Description
nsplits returns the total number of possible splits of variables into groups.
Usage
nsplit(p, G, use.all = TRUE, fix.partition = NULL)
Arguments
| p | Number of variables or objects to split. | 
| G | Number of groups into which the variables are split. | 
| use.all | Boolean variable to determine if all variables must be used (default is TRUE). | 
| fix.partition | Optional matrix with G columns (or list if more than one value of G) indicating the partitions (in each row) to be considered for the splits. | 
Value
A numeric vector with the total number of possible splits.
Author(s)
Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca
Examples
# Compute the total number of possible splits of 6 variables into 3 groups
# We use all the variables
out.n.splits.all <- nsplit(p=6, G=3, use.all=TRUE)
out.n.splits.all
# We don't enforce using all the variables
out.n.splits <- nsplit(p=6, G=3, use.all=FALSE)
out.n.splits
Predictions for cv.splitSelect object
Description
predict.cv.splitSelect returns the prediction for cv.splitSelect for new data.
Usage
## S3 method for class 'cv.splitSelect'
predict(object, newx, optimal.only = TRUE, ...)
Arguments
| object | An object of class cv.splitSelect. | 
| newx | A matrix with the new data. | 
| optimal.only | A boolean variable (TRUE default) to indicate if only the predictions of the optimal split are returned. | 
| ... | Additional arguments for compatibility. | 
Value
A matrix with the predictions of the cv.splitSelect object.
Author(s)
Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca
See Also
Examples
# Setting the parameters
p <- 4
n <- 30
n.test <- 5000
beta <- rep(5,4)
rho <- 0.1
r <- 0.9
SNR <- 3
# Creating the target matrix with "kernel" set to rho
target_cor <- function(r, p){
  Gamma <- diag(p)
  for(i in 1:(p-1)){
    for(j in (i+1):p){
      Gamma[i,j] <- Gamma[j,i] <- r^(abs(i-j))
    }
  }
  return(Gamma)
}
# AR Correlation Structure
Sigma.r <- target_cor(r, p)
Sigma.rho <- target_cor(rho, p)
sigma.epsilon <- as.numeric(sqrt((t(beta) %*% Sigma.rho %*% beta)/SNR))
# Simulate some data
x.train <- mvnfast::rmvn(30, mu=rep(0,p), sigma=Sigma.r)
y.train <- 1 + x.train %*% beta + rnorm(n=n, mean=0, sd=sigma.epsilon)
x.test <- mvnfast::rmvn(n.test, mu=rep(0,p), sigma=Sigma.rho)
y.test <- 1 + x.test %*% beta + rnorm(n.test, sd=sigma.epsilon)
# Generating the coefficients for a fixed split
split.out <- cv.splitSelect(x.train, y.train, G=2, use.all=TRUE,
                            fix.partition=list(matrix(c(2,2), 
                                               ncol=2, byrow=TRUE)), 
                            fix.split=NULL,
                            intercept=TRUE, group.model="glmnet", alphas=0)
predict(split.out, newx=x.test)
Predictions for splitSelect object
Description
predict.splitSelect returns the prediction for splitSelect for new data.
Usage
## S3 method for class 'splitSelect'
predict(object, newx, ...)
Arguments
| object | An object of class splitSelect. | 
| newx | A matrix with the new data. | 
| ... | Additional arguments for compatibility. | 
Value
A matrix with the predictions of the splitSelect object.
Author(s)
Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca
See Also
Examples
# Setting the parameters
p <- 4
n <- 30
n.test <- 5000
beta <- rep(5,4)
rho <- 0.1
r <- 0.9
SNR <- 3
# Creating the target matrix with "kernel" set to rho
target_cor <- function(r, p){
  Gamma <- diag(p)
  for(i in 1:(p-1)){
    for(j in (i+1):p){
      Gamma[i,j] <- Gamma[j,i] <- r^(abs(i-j))
    }
  }
  return(Gamma)
}
# AR Correlation Structure
Sigma.r <- target_cor(r, p)
Sigma.rho <- target_cor(rho, p)
sigma.epsilon <- as.numeric(sqrt((t(beta) %*% Sigma.rho %*% beta)/SNR))
# Simulate some data
x.train <- mvnfast::rmvn(30, mu=rep(0,p), sigma=Sigma.r)
y.train <- 1 + x.train %*% beta + rnorm(n=n, mean=0, sd=sigma.epsilon)
x.test <- mvnfast::rmvn(n.test, mu=rep(0,p), sigma=Sigma.rho)
y.test <- 1 + x.test %*% beta + rnorm(n.test, sd=sigma.epsilon)
# Generating the coefficients for a fixed split
split.out <- splitSelect(x.train, y.train, G=2, use.all=TRUE,
                         fix.partition=list(matrix(c(2,2), ncol=2, byrow=TRUE)), fix.split=NULL,
                         intercept=TRUE, group.model="glmnet", alphas=0)
predict(split.out, newx=x.test)
Generate Samples of Splits Possibilities
Description
rsplit returns a matrix with random splits of the variables in groups.
Usage
rsplit(n, p, G, use.all = TRUE, fix.partition = NULL, verbose = TRUE)
Arguments
| n | Number of sample splits. | 
| p | Number of variables or objects to split. | 
| G | Number of groups into which the variables are split. | 
| use.all | Boolean variable to determine if all variables must be used (default is TRUE). | 
| fix.partition | Optional matrix with G columns indicating the partitions (in each row) to be considered for the splits. | 
| verbose | Boolean variable to determine if console output for cross-validation progress is printed (default is TRUE). | 
Value
A matrix or list with the number of possible objects in each group using splits.
Author(s)
Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca
Examples
# Generating sample splits of 6 variables in 3 groups
# Using all the variables
random.splits <- rsplit(100, 6, 3)
# Using fixed partitions
random.splits.fixed <- rsplit(100, 6, 3, fix.partition=matrix(c(2,2,2), nrow=1))
Best Split Selection Modeling for Low-Dimensional Data
Description
splitSelect performs the best split selection algorithm.
Usage
splitSelect(
  x,
  y,
  intercept = TRUE,
  G,
  use.all = TRUE,
  family = c("gaussian", "binomial")[1],
  group.model = c("glmnet", "LS", "Logistic")[1],
  lambdas = NULL,
  alphas = 0,
  nsample = NULL,
  fix.partition = NULL,
  fix.split = NULL,
  parallel = FALSE,
  cores = getOption("mc.cores", 2L),
  verbose = TRUE
)
Arguments
| x | Design matrix. | 
| y | Response vector. | 
| intercept | Boolean variable to determine if there is intercept (default is TRUE) or not. | 
| G | Number of groups into which the variables are split. Can have more than one value. | 
| use.all | Boolean variable to determine if all variables must be used (default is TRUE). | 
| family | Description of the error distribution and link function to be used for the model. Must be one of "gaussian" or "binomial". | 
| group.model | Model used for the groups. Must be one of "glmnet" or "LS". | 
| lambdas | The shinkrage parameters for the "glmnet" regularization. If NULL (default), optimal values are chosen. | 
| alphas | Elastic net mixing parameter. Should be between 0 (default) and 1. | 
| nsample | Number of sample splits for each value of G. If NULL, then all splits will be considered (unless there is overflow). | 
| fix.partition | Optional list with G elements indicating the partitions (in each row) to be considered for the splits. | 
| fix.split | Optional matrix with p columns indicating the groups (in each row) to be considered for the splits. | 
| parallel | Boolean variable to determine if parallelization of the function. Default is FALSE. | 
| cores | Number of cores for the parallelization for the function. | 
| verbose | Boolean variable to determine if console output for cross-validation progress is printed (default is TRUE). | 
Value
An object of class splitSelect.
Author(s)
Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca
See Also
coef.splitSelect, predict.splitSelect
Examples
# Setting the parameters
p <- 4
n <- 30
n.test <- 5000
beta <- rep(5,4)
rho <- 0.1
r <- 0.9
SNR <- 3
# Creating the target matrix with "kernel" set to rho
target_cor <- function(r, p){
  Gamma <- diag(p)
  for(i in 1:(p-1)){
    for(j in (i+1):p){
      Gamma[i,j] <- Gamma[j,i] <- r^(abs(i-j))
    }
  }
  return(Gamma)
}
# AR Correlation Structure
Sigma.r <- target_cor(r, p)
Sigma.rho <- target_cor(rho, p)
sigma.epsilon <- as.numeric(sqrt((t(beta) %*% Sigma.rho %*% beta)/SNR))
# Simulate some data
x.train <- mvnfast::rmvn(30, mu=rep(0,p), sigma=Sigma.r)
y.train <- 1 + x.train %*% beta + rnorm(n=n, mean=0, sd=sigma.epsilon)
# Generating the coefficients for a fixed partition of the variables
split.out <- splitSelect(x.train, y.train, G=2, use.all=TRUE,
                         fix.partition=list(matrix(c(2,2), 
                                             ncol=2, byrow=TRUE)), 
                         fix.split=NULL,
                         intercept=TRUE, group.model="glmnet", alphas=0)
Split Selection for Regression - Coefficients Generation
Description
splitSelect_coef generates the coefficients for a particular split of variables into groups.
Usage
splitSelect_coef(
  x,
  y,
  variables.split,
  intercept = TRUE,
  family = c("gaussian", "binomial")[1],
  group.model = c("glmnet", "LS", "Logistic")[1],
  lambdas = NULL,
  alphas = 0
)
Arguments
| x | Design matrix. | 
| y | Response vector. | 
| variables.split | A vector with the split of the variables into groups as values. | 
| intercept | Boolean variable to determine if there is intercept (default is TRUE) or not. | 
| family | Description of the error distribution and link function to be used for the model. Must be one of "gaussian" or "binomial". | 
| group.model | Model used for the groups. Must be one of "glmnet" or "LS". | 
| lambdas | The shinkrage parameters for the "glmnet" regularization. If NULL (default), optimal values are chosen. | 
| alphas | Elastic net mixing parameter. Should be between 0 (default) and 1. | 
Value
A vector with the regression coefficients for the split.
Author(s)
Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca
Examples
# Setting the parameters
p <- 6
n <- 30
n.test <- 5000
group.beta <- -3
beta <- c(rep(1, 2), rep(group.beta, p-2))
rho <- 0.1
r <- 0.9
SNR <- 3
# Creating the target matrix with "kernel" set to rho
target_cor <- function(r, p){
  Gamma <- diag(p)
  for(i in 1:(p-1)){
    for(j in (i+1):p){
      Gamma[i,j] <- Gamma[j,i] <- r^(abs(i-j))
    }
  }
  return(Gamma)
}
# AR Correlation Structure
Sigma.r <- target_cor(r, p)
Sigma.rho <- target_cor(rho, p)
sigma.epsilon <- as.numeric(sqrt((t(beta) %*% Sigma.rho %*% beta)/SNR))
# Simulate some data
x.train <- mvnfast::rmvn(30, mu=rep(0,p), sigma=Sigma.r)
y.train <- 1 + x.train %*% beta + rnorm(n=n, mean=0, sd=sigma.epsilon)
x.test <- mvnfast::rmvn(n.test, mu=rep(0,p), sigma=Sigma.rho)
y.test <- 1 + x.test %*% beta + rnorm(n.test, sd=sigma.epsilon)
# Generating the coefficients for a fixed split
splitSelect_coef(x.train, y.train, variables.split=matrix(c(1,2,1,2,1,2), nrow=1))