| Type: | Package | 
| Title: | Bi-Level Selection of Conditional Main Effects | 
| Version: | 0.1.2 | 
| Author: | Simon Mak | 
| Maintainer: | Simon Mak <sm769@duke.edu> | 
| Description: | Provides functions for implementing cmenet - a bi-level variable selection method for conditional main effects (see Mak and Wu (2018) <doi:10.1080/01621459.2018.1448828>). CMEs are reparametrized interaction effects which capture the conditional impact of a factor at a fixed level of another factor. Compared to traditional two-factor interactions, CMEs can quantify more interpretable interaction effects in many problems. The current implementation performs variable selection on only binary CMEs; we are working on an extension for the continuous setting. | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| LazyData: | FALSE | 
| Imports: | Rcpp (≥ 0.12.4), MASS, glmnet, hierNet, sparsenet | 
| LinkingTo: | Rcpp, RcppArmadillo | 
| RoxygenNote: | 6.1.1 | 
| NeedsCompilation: | yes | 
| Packaged: | 2022-05-27 01:00:52 UTC; simon | 
| Repository: | CRAN | 
| Date/Publication: | 2022-05-27 07:10:02 UTC | 
Bi-level selection of conditional main effects (fixed parameters)
Description
cmenet performs variable selection of conditional main effects (CMEs) via a bi-level penalization framework, given fixed penalty parameters.
Usage
cmenet(xme, xcme, y,
          lambda.sib=exp(seq(from=log(max.lambda),to=log(max.lambda*1e-6),length=20)),
          lambda.cou=exp(seq(from=log(max.lambda),to=log(max.lambda*1e-6),length=20)),
          max.lambda=lambda0.cme(cbind(xme,xcme),y),
          gamma=1/(0.5-tau)+0.001, tau=0.01,
          act.vec=rep(1,ncol(xme)+ncol(xcme)),
          beta0=rep(0,ncol(xme)+ncol(xcme)),
          it.max=250, lambda.flg=T)
Arguments
| xme | An  | 
| xcme | An  | 
| y | An  | 
| lambda.sib | Penalty vector for sibling CMEs. | 
| lambda.cou | Penalty vector for cousin CMEs. | 
| max.lambda | Maximum penalty value. | 
| gamma | Bridge parameter in MC+ penalty. | 
| tau | Coupling parameter for CMEs. | 
| act.vec | A ( | 
| beta0 | Initial regression coefficients. | 
| it.max | Number of optimization iterations. | 
| lambda.flg | Use the default option TRUE (unless within  | 
Value
| coefficients | Array of regression coefficients (over different  | 
| residuals | Array of regression residuals (over different  | 
| inter | Matrix of intercept estimates (over different  | 
References
Mak and Wu (2018). cmenet: a new method for bi-level variable selection of conditional main effects. Journal of the American Statistical Association, to appear.
Examples
## Not run: 
    library(MASS)
n <- 50 #number of observations
p <- 50 #number of main effects
## Simulate model matrix for MEs and CMEs
set.seed(1)
rho <- 0 #correlation
ones <- matrix(1,p,p)
covmtx <- rho*ones+(1-rho)*diag(p)
latmtx <- mvrnorm(n,p,mu=rep(0,p),Sigma=covmtx) #equicorrelated cov. matrix
memtx <- (latmtx>=0)-(latmtx<0) #simulate model matrix for MEs
model.mtx <- full.model.mtx(memtx)$model.mtx #generate model matrix for MEs and CMEs
## Set true model and generate response
num.act <- 2 # two siblings active
num.grp <- 4 # ... within four active groups
ind <- c()
for (ii in 1:num.grp){
  eff <- sample(seq(2*(p-1)),num.act)
  ind <- c(ind, p + eff + (ii-1)*(2*(p-1)))
}
colnames(model.mtx)[ind] # active CMEs
des.mtx <- model.mtx[,ind]
inter <- 12 #intercept
xbtrue <- inter + rowSums(des.mtx)
y <- xbtrue + rnorm(n,sd=1) #response
xme <- model.mtx[,1:p]
xcme <- model.mtx[,(p+1):ncol(model.mtx)]
## Run cmenet
cv.cme <- cv.cmenet(xme, xcme, y, var.names=colnames(model.mtx))
## End(Not run)
Bi-level selection of conditional main effects
Description
The main function in this package. cv.cmenet performs variable selection of conditional main effects (CMEs) via a bi-level penalization framework, with penalty parameters tuned via cross-validation.
Usage
cv.cmenet(xme, xcme, y,
                    nfolds = 10, var.names = NULL,
                    nlambda.sib=20, nlambda.cou=20, lambda.min.ratio=1e-6,
                    ngamma=10, max.gamma=150, ntau=20,
                    max.tau=0.01, tau.min.ratio=0.01,
                    it.max=250, it.max.cv=25, warm.str="lasso")
Arguments
| xme | An  | 
| xcme | An  | 
| y | An  | 
| nfolds | Number of folds for cross-validation. | 
| var.names | A ( | 
| nlambda.sib | Number of values for  | 
| nlambda.cou | Number of values for  | 
| lambda.min.ratio | Smallest value for  | 
| ngamma | Number of values for  | 
| max.gamma | Maximum value for  | 
| ntau | Number of values for  | 
| max.tau | Maximum value for  | 
| tau.min.ratio | Smallest value for  | 
| it.max | Number of optimization iterations. | 
| it.max.cv | Number of optimization iterations in cross-validation. | 
| warm.str | A string indicating which method should be used for warm-starting active variables. Current options include  | 
Value
| cme.fit | Fitted  | 
| params | Fitted penalty parameters ( | 
| select.names | Selected ME and CME variables. | 
| select.idx | Indices for  | 
References
Mak and Wu (2018). cmenet: a new method for bi-level variable selection of conditional main effects. Journal of the American Statistical Association, to appear.
Examples
  ## Not run: 
library(MASS)
n <- 50 #number of observations
p <- 50 #number of main effects
## Simulate model matrix for MEs and CMEs
set.seed(1)
rho <- 0 #correlation
ones <- matrix(1,p,p)
covmtx <- rho*ones+(1-rho)*diag(p)
latmtx <- mvrnorm(n,p,mu=rep(0,p),Sigma=covmtx) #equicorrelated cov. matrix
memtx <- (latmtx>=0)-(latmtx<0) #simulate model matrix for MEs
model.mtx <- full.model.mtx(memtx)$model.mtx #generate model matrix for MEs and CMEs
## Set true model and generate response
num.act <- 2 # two siblings active
num.grp <- 4 # ... within four active groups
ind <- c()
for (ii in 1:num.grp){
  eff <- sample(seq(2*(p-1)),num.act)
  ind <- c(ind, p + eff + (ii-1)*(2*(p-1)))
}
colnames(model.mtx)[ind] # active CMEs
des.mtx <- model.mtx[,ind]
inter <- 12 #intercept
xbtrue <- inter + rowSums(des.mtx)
y <- xbtrue + rnorm(n,sd=1) #response
xme <- model.mtx[,1:p]
xcme <- model.mtx[,(p+1):ncol(model.mtx)]
#---------------------------------------------------------------
# Selection of MEs and CMEs:
#---------------------------------------------------------------
## cmenet (parameters tuned via cross-validation)
cv.cme <- cv.cmenet(xme, xcme, y, var.names=colnames(model.mtx))
fit.cme <- cv.cme$cme.fit
sel.cme <- cv.cme$select.idx
colnames(model.mtx)[ind] #true model
colnames(model.mtx)[sel.cme] #selected effects from cmenet
colnames(model.mtx)[setdiff(sel.cme,ind)] #selected effects not in true model
colnames(model.mtx)[setdiff(ind,sel.cme)] #true effects not in selected model
## lasso
library(glmnet)
cv.las <- cv.glmnet(cbind(xme,xcme),y)
fit.las <- glmnet(cbind(xme,xcme),y)
sel.las <- which(fit.las$beta[,which(cv.las$lambda==cv.las$lambda.min)]!=0)
colnames(model.mtx)[ind] #true model
colnames(model.mtx)[sel.las] #selected effects from lasso
colnames(model.mtx)[setdiff(sel.las,ind)] #selected effects not in true model
colnames(model.mtx)[setdiff(ind,sel.las)] #true effects not in selected model
## sparsenet
library(sparsenet)
cv.sn <- cv.sparsenet(cbind(xme,xcme),y)
fit.sn <- sparsenet(cbind(xme,xcme),y)
sel.sn <- which(fit.sn$coefficients[[cv.sn$which.min[2]]]$beta[,cv.sn$which.min[1]]!=0)
colnames(model.mtx)[ind] #true model
colnames(model.mtx)[sel.sn] #selected effects from sparsenet
colnames(model.mtx)[setdiff(sel.sn,ind)] #selected effects not in true model
colnames(model.mtx)[setdiff(ind,sel.sn)] #true effects not in selected model
#---------------------------------------------------------------
## Comparison:
#---------------------------------------------------------------
## (a) Misspecifications
length(setdiff(sel.cme,ind)) + length(setdiff(ind,sel.cme)) # cmenet:    25
length(setdiff(sel.las,ind)) + length(setdiff(ind,sel.las)) # lasso:     29
length(setdiff(sel.sn,ind)) + length(setdiff(ind,sel.sn))   # sparsenet: 60
## (b) MSPE
set.seed(1000)
ntst <- 20
latmtx <- mvrnorm(ntst,p,mu=rep(0,p),Sigma=covmtx)
memtx <- (latmtx>=0)-(latmtx<0)
tst.mtx <- full.model.mtx(memtx)$model.mtx
xbtst <- inter + rowSums(tst.mtx[,ind])
ytst <- xbtst + rnorm(ntst,sd=1)
pred.cme <- predictcme(fit.cme,newx=tst.mtx)[,which(cv.cme$lambda.sib==cv.cme$params[1]),
            which(cv.cme$lambda.cou==cv.cme$params[2])]
pred.las <- predict(fit.las,newx=tst.mtx)[,which(cv.las$lambda==cv.las$lambda.min)]
pred.sn <- predict(fit.sn,newx=tst.mtx)[[which(fit.sn$gamma==cv.sn$parms.min[1])]][,
           which(fit.sn$lambda==cv.sn$parms.min[2])]
mean( (ytst-pred.cme)^2 ) # cmenet:    3.61
mean( (ytst-pred.las)^2 ) # lasso:     4.22
mean( (ytst-pred.sn)^2 )  # sparsenet: 4.00
  
## End(Not run)
Generate full model matrix for MEs and CMEs
Description
full.model.mtx returns the full model matrix for main effects (MEs) and conditional main effects (CMEs).
Usage
  full.model.mtx(xme)
Arguments
| xme | An  | 
Value
| model.mtx | An  | 
| cme.mtx | An  | 
Examples
library(MASS)
n <- 50 #number of observations
p <- 50 #number of main effects
## Simulate model matrix for MEs and CMEs
set.seed(1)
rho <- 0 #correlation
ones <- matrix(1,p,p)
covmtx <- rho*ones+(1-rho)*diag(p)
latmtx <- mvrnorm(n,p,mu=rep(0,p),Sigma=covmtx) #equicorrelated cov. matrix
memtx <- (latmtx>=0)-(latmtx<0) #simulate model matrix for MEs
model.mtx <- full.model.mtx(memtx)$model.mtx #generate model matrix for MEs and CMEs
Maize dataset
Description
A subset of the maize dataset from Buckler et al. (2009), with n = 150 observations (days to male flowering time) and p = 40 main effects (binary SNP markers).
Usage
data(maize)
References
Buckler et al. (2009). The genetic architecture of maize flowering time. Science 325, 714-718.
Examples
  ## Not run: 
library(cmenet)
library(hierNet)
## Load data
data(maize) #load in main effects (MEs) and response
xme <- as.matrix(maize[,1:(ncol(maize)-1)])
yy <- as.vector(maize[,ncol(maize)])
nn <- nrow(xme)
pp <- ncol(xme)
model.mtx <- full.model.mtx(xme)$model.mtx #full model matrix
xcme <- model.mtx[,(pp+1):ncol(model.mtx)] #model matrix for conditional main effects (CMEs)
#---------------------------------------------------------------
## Selection:
#---------------------------------------------------------------
## cmenet (new analysis: MEs and CMEs)
set.seed(1000)
cv.cme <- cv.cmenet(xme,xcme,yy,var.names=colnames(model.mtx)) #CV fit
cme.dat <- data.frame(y=yy,x=model.mtx[,cv.cme$select.idx])
cme.glm <- lm(y~.,data=cme.dat) #linear model on selected effects
cv.cme$select.names #selected effects
summary(cme.glm)$coefficients[,4] #p-values
## hierNet (traditional analysis: MEs and two-factor interactions)
set.seed(1000)
hnp <- hierNet.path(xme,yy) #hierNet path
cv.hn <- hierNet.cv(hnp,xme,yy) #CV fit
l.opt <- which(hnp$lamlist==cv.hn$lamhat)
me.sel <- (hnp$bp-hnp$bn)[,l.opt]
me.idx <- which(me.sel!=0) #selected main effects
int.sel <- hnp$th[,,l.opt]
int.idx <- which(int.sel!=0,arr.ind=T)
int.idx <- t(apply(int.idx,1,function(xx){sort(xx)}))
int.idx <- unique(int.idx) #selected interactions
model.mtx.hier <- xme[,me.idx] #model matrix on selected effects
for (ll in 1:nrow(int.idx)){
  model.mtx.hier <- cbind(model.mtx.hier, xme[,int.idx[ll,1]]*xme[,int.idx[ll,2]] )
}
int.nm <- sapply(1:nrow(int.idx),function(xx){
  paste0(colnames(xme)[int.idx[xx,1]],colnames(xme)[int.idx[xx,2]])
})
colnames(model.mtx.hier) <- c(colnames(xme)[me.idx],int.nm)
hn.dat <- data.frame(y=yy,x=model.mtx.hier)
hn.glm <- lm(y~.,data=hn.dat) #linear model on selected effects
colnames(model.mtx.hier) #selected effects
summary(hn.glm)$coefficients[,4] #p-values
#---------------------------------------------------------------
## Analysis of selected effects:
# (a) cmenet: more parsimonious gene-gene interaction model
#     - hierNet: 66 variables
#     - cmenet:  17 variables
# (b) cmenet: greater insight on the conditional structure of
#     selected MEs from traditional analysis (w/ lower p-values)
#     - hierNet: g38
#     - cmenet:  g11|g38+, g12|g38-, g14|g38+
#     Interpretation:
#     - hierNet: gene 38 is active
#     - cmenet: gene 38 activates genes 11 and 14, and inhibits gene 12
# (c) cmenet: selected CMEs are more interpretable than selected
#     interactions from traditional analysis (w/ lower p-values)
#     - hierNet: g1*g39, g27*g39
#     - cmenet:  g1|g39-, g27|g39-
#     Interpretation:
#     - hierNet: interactions exist b/w g1 & g39, and g27 & g39
#     - cmenet:  gene 39 inhibits gene 1 and gene 27
#---------------------------------------------------------------
#---------------------------------------------------------------
## Prediction:
#---------------------------------------------------------------
## cmenet (new analysis)
set.seed(1111)
test.prop <- 0.5 # 
ntrials <- 10 # no. of replications
mspe1 <- rep(NA,ntrials)
for (i in 1:ntrials){
  # sample testing and training data
  foldid = sample(rep(seq(1/test.prop), length=length(yy)))
  yy.tr <- yy[which(foldid!=1)] #training
  xme.tr <- xme[which(foldid!=1),]
  xcme.tr <- xcme[which(foldid!=1),]
  yy.ts <- yy[which(foldid==1)] #testing
  xme.ts <- xme[which(foldid==1),]
  xcme.ts <- xcme[which(foldid==1),]
  # fit cmenet
  cv.cme <- cv.cmenet(xme.tr,xcme.tr,yy.tr,var.names=colnames(model.mtx))
  obj <- cv.cme$cme.fit
  pred <- predictcme(obj,newx=cbind(xme.ts,xcme.ts))
  mspe1[i] <- mean( (yy.ts-pred[,which(cv.cme$lambda.sib==cv.cme$params[1]),
                                which(cv.cme$lambda.cou==cv.cme$params[2])])^2 )
}
mean(mspe1) #avg. mspe = 10.80
## hierNet (traditional analysis)
set.seed(1111)
test.prop <- 0.5 # 
ntrials <- 10 # no. of replications
mspe2 <- rep(NA,ntrials)
for (i in 1:ntrials){
  # sample testing and training data
  foldid = sample(rep(seq(1/test.prop), length=length(yy)))
  yy.tr <- yy[which(foldid!=1)]
  xme.tr <- xme[which(foldid!=1),]
  xcme.tr <- xcme[which(foldid!=1),]
  yy.ts <- yy[which(foldid==1)]
  xme.ts <- xme[which(foldid==1),]
  xcme.ts <- xcme[which(foldid==1),]
  # fit hierNet
  hnfit <- hierNet.path(xme.tr,yy.tr)
  cv.hn <- hierNet.cv(hnfit,xme.tr,yy.tr)
  l.opt <- which(hnfit$lamlist==cv.hn$lamhat)
  mspe2[i] <- mean( (yy.ts-predict(hnfit,newx=xme.ts)[,l.opt])^2 )
}
mean(mspe2) #avg. mspe = 11.31
#---------------------------------------------------------------
## Analysis of MSPE:
# - cmenet gives lower prediction error, which suggests
#   underlying gene-gene interactions may indeed be conditional
#---------------------------------------------------------------
  
## End(Not run)
Predict using a fitted cmenet object
Description
predictcme performs prediction at new ME settings newx, given fitted cmenet object.
Usage
  predictcme(fit.cme,newx)
Arguments
| fit.cme | Fitted object from  | 
| newx | An  | 
Examples
  ## Not run: 
library(MASS)
library(cmenet)
n <- 50 #number of observations
p <- 50 #number of main effects
## Simulate model matrix for MEs and CMEs
set.seed(1)
rho <- 0 #correlation
ones <- matrix(1,p,p)
covmtx <- rho*ones+(1-rho)*diag(p)
latmtx <- mvrnorm(n,p,mu=rep(0,p),Sigma=covmtx) #equicorrelated cov. matrix
memtx <- (latmtx>=0)-(latmtx<0) #simulate model matrix for MEs
model.mtx <- full.model.mtx(memtx)$model.mtx #generate model matrix for MEs and CMEs
## Set true model and generate response
num.act <- 2 # two siblings active
num.grp <- 4 # ... within four active groups
ind <- c()
for (ii in 1:num.grp){
  eff <- sample(seq(2*(p-1)),num.act)
  ind <- c(ind, p + eff + (ii-1)*(2*(p-1)))
}
colnames(model.mtx)[ind] # active CMEs
des.mtx <- model.mtx[,ind]
inter <- 12 #intercept
xbtrue <- inter + rowSums(des.mtx)
y <- xbtrue + rnorm(n,sd=1) #response
xme <- model.mtx[,1:p]
xcme <- model.mtx[,(p+1):ncol(model.mtx)]
## Run cv.cmenet
cv.cme <- cv.cmenet(xme, xcme, y, var.names=colnames(model.mtx))
fit.cme <- cv.cme$cme.fit
sel.cme <- cv.cme$select.idx
colnames(model.mtx)[ind] #true model
colnames(model.mtx)[sel.cme] #selected effects from cmenet
colnames(model.mtx)[setdiff(sel.cme,ind)] #selected effects not in true model
colnames(model.mtx)[setdiff(ind,sel.cme)] #true effects not in selected model
## Prediction
set.seed(1000)
ntst <- 20
latmtx <- mvrnorm(ntst,p,mu=rep(0,p),Sigma=covmtx)
memtx <- (latmtx>=0)-(latmtx<0)
tst.mtx <- full.model.mtx(memtx)$model.mtx
xbtst <- inter + rowSums(tst.mtx[,ind])
ytst <- xbtst + rnorm(ntst,sd=1)
pred.cme <- predictcme(fit.cme,newx=tst.mtx)[,which(cv.cme$lambda.sib==cv.cme$params[1]),
            which(cv.cme$lambda.cou==cv.cme$params[2])]
  
## End(Not run)