| Type: | Package | 
| Title: | Assumption-Lean and Data-Adaptive Post-Prediction Inference | 
| Version: | 1.0.0 | 
| Maintainer: | Jiacheng Miao <jiacheng.miao@wisc.edu> | 
| Description: | Implementation of assumption-lean and data-adaptive post-prediction inference (POPInf), for valid and efficient statistical inference based on data predicted by machine learning. See Miao, Miao, Wu, Zhao, and Lu (2023) <doi:10.48550/arXiv.2311.14220>. | 
| URL: | https://arxiv.org/abs/2311.14220, https://github.com/qlu-lab/POPInf | 
| Depends: | R (≥ 3.5.0), | 
| Imports: | randomForest, MASS | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.3 | 
| NeedsCompilation: | no | 
| Packaged: | 2024-02-19 18:38:56 UTC; jiacheng | 
| Author: | Jiacheng Miao | 
| Repository: | CRAN | 
| Date/Publication: | 2024-02-20 20:40:12 UTC | 
Calculation of the matrix A based on single dataset
Description
A function for the calculation of the matrix A based on single dataset
Usage
A(X, Y, quant = NA, theta, method)
Arguments
| X | Array or DataFrame containing covariates | 
| Y | Array or DataFrame of outcomes | 
| quant | quantile for quantile estimation | 
| theta | parameter theta | 
| method | indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". | 
Value
matrix A based on single dataset
Variance-covariance matrix of the estimation equation
Description
Sigma_cal function for variance-covariance matrix of the estimation equation
Usage
Sigma_cal(
  X_lab,
  X_unlab,
  Y_lab,
  Yhat_lab,
  Yhat_unlab,
  w,
  theta,
  quant = NA,
  A_lab_inv,
  A_unlab_inv,
  method
)
Arguments
| X_lab | Array or DataFrame containing observed covariates in labeled data. | 
| X_unlab | Array or DataFrame containing observed or predicted covariates in unlabeled data. | 
| Y_lab | Array or DataFrame of observed outcomes in labeled data. | 
| Yhat_lab | Array or DataFrame of predicted outcomes in labeled data. | 
| Yhat_unlab | Array or DataFrame of predicted outcomes in unlabeled data. | 
| w | weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates). | 
| theta | parameter theta | 
| quant | quantile for quantile estimation | 
| A_lab_inv | Inverse of matrix A using labeled data | 
| A_unlab_inv | Inverse of matrix A using unlabeled data | 
| method | indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". | 
Value
variance-covariance matrix of the estimation equation
Initial estimation
Description
est_ini function for initial estimation
Usage
est_ini(X, Y, quant = NA, method)
Arguments
| X | Array or DataFrame containing covariates | 
| Y | Array or DataFrame of outcomes | 
| quant | quantile for quantile estimation | 
| method | indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". | 
Value
initial estimatior
Hessians of the link function
Description
link_Hessian function for Hessians of the link function
Usage
link_Hessian(t, method)
Arguments
| t | t | 
| method | indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". | 
Value
Hessians of the link function
gradient of the link function
Description
link_grad function for gradient of the link function
Usage
link_grad(t, method)
Arguments
| t | t | 
| method | indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". | 
Value
gradient of the link function
Sample expectation of psi
Description
mean_psi function for sample expectation of psi
Usage
mean_psi(X, Y, theta, quant = NA, method)
Arguments
| X | Array or DataFrame containing covariates | 
| Y | Array or DataFrame of outcomes | 
| theta | parameter theta | 
| quant | quantile for quantile estimation | 
| method | indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". | 
Value
sample expectation of psi
Sample expectation of POP-Inf psi
Description
mean_psi_pop function for sample expectation of POP-Inf psi
Usage
mean_psi_pop(
  X_lab,
  X_unlab,
  Y_lab,
  Yhat_lab,
  Yhat_unlab,
  w,
  theta,
  quant = NA,
  method
)
Arguments
| X_lab | Array or DataFrame containing observed covariates in labeled data. | 
| X_unlab | Array or DataFrame containing observed or predicted covariates in unlabeled data. | 
| Y_lab | Array or DataFrame of observed outcomes in labeled data. | 
| Yhat_lab | Array or DataFrame of predicted outcomes in labeled data. | 
| Yhat_unlab | Array or DataFrame of predicted outcomes in unlabeled data. | 
| w | weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates). | 
| theta | parameter theta | 
| quant | quantile for quantile estimation | 
| method | indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". | 
Value
sample expectation of POP-Inf psi
Gradient descent for obtaining estimator
Description
optim_est function for gradient descent for obtaining estimator
Usage
optim_est(
  X_lab,
  X_unlab,
  Y_lab,
  Yhat_lab,
  Yhat_unlab,
  w,
  theta,
  quant = NA,
  method,
  step_size = 0.1,
  max_iterations = 500,
  convergence_threshold = 1e-06
)
Arguments
| X_lab | Array or DataFrame containing observed covariates in labeled data. | 
| X_unlab | Array or DataFrame containing observed or predicted covariates in unlabeled data. | 
| Y_lab | Array or DataFrame of observed outcomes in labeled data. | 
| Yhat_lab | Array or DataFrame of predicted outcomes in labeled data. | 
| Yhat_unlab | Array or DataFrame of predicted outcomes in unlabeled data. | 
| w | weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates). | 
| theta | parameter theta | 
| quant | quantile for quantile estimation | 
| method | indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". | 
| step_size | step size for gradient descent | 
| max_iterations | maximum of iterations for gradient descent | 
| convergence_threshold | convergence threshold for gradient descent | 
Value
estimator
Gradient descent for obtaining the weight vector
Description
optim_weights function for gradient descent for obtaining estimator
Usage
optim_weights(
  j,
  X_lab,
  X_unlab,
  Y_lab,
  Yhat_lab,
  Yhat_unlab,
  w,
  theta,
  quant = NA,
  method
)
Arguments
| j | j-th coordinate of weights vector | 
| X_lab | Array or DataFrame containing observed covariates in labeled data. | 
| X_unlab | Array or DataFrame containing observed or predicted covariates in unlabeled data. | 
| Y_lab | Array or DataFrame of observed outcomes in labeled data. | 
| Yhat_lab | Array or DataFrame of predicted outcomes in labeled data. | 
| Yhat_unlab | Array or DataFrame of predicted outcomes in unlabeled data. | 
| w | weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates). | 
| theta | parameter theta | 
| quant | quantile for quantile estimation | 
| method | indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". | 
Value
weights
POP-Inf M-Estimation
Description
pop_M function conducts post-prediction M-Estimation.
Usage
pop_M(
  X_lab = NA,
  X_unlab = NA,
  Y_lab,
  Yhat_lab,
  Yhat_unlab,
  alpha = 0.05,
  weights = NA,
  max_iterations = 100,
  convergence_threshold = 0.05,
  quant = NA,
  intercept = FALSE,
  focal_index = NA,
  method
)
Arguments
| X_lab | Array or DataFrame containing observed covariates in labeled data. | 
| X_unlab | Array or DataFrame containing observed or predicted covariates in unlabeled data. | 
| Y_lab | Array or DataFrame of observed outcomes in labeled data. | 
| Yhat_lab | Array or DataFrame of predicted outcomes in labeled data. | 
| Yhat_unlab | Array or DataFrame of predicted outcomes in unlabeled data. | 
| alpha | Specifies the confidence level as 1 - alpha for confidence intervals. | 
| weights | weights vector POP-Inf linear regression (d-dimensional, where d equals the number of covariates). | 
| max_iterations | Sets the maximum number of iterations for the optimization process to derive weights. | 
| convergence_threshold | Sets the convergence threshold for the optimization process to derive weights. | 
| quant | quantile for quantile estimation | 
| intercept | Boolean indicating if the input covariates' data contains the intercept (TRUE if the input data contains) | 
| focal_index | Identifies the focal index for variance reduction. | 
| method | indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". | 
Value
A summary table presenting point estimates, standard error, confidence intervals (1 - alpha), P-values, and weights.
Examples
data <- sim_data()
X_lab <- data$X_lab
X_unlab <- data$X_unlab
Y_lab <- data$Y_lab
Yhat_lab <- data$Yhat_lab
Yhat_unlab <- data$Yhat_unlab
pop_M(Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab,
      alpha = 0.05, method = "mean")
pop_M(Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab,
      alpha = 0.05, quant = 0.75, method = "quantile")
pop_M(X_lab = X_lab, X_unlab = X_unlab,
      Y_lab = Y_lab, Yhat_lab = Yhat_lab, Yhat_unlab = Yhat_unlab,
      alpha = 0.05, method = "ols")
Esimating equation
Description
psi function for esimating equation
Usage
psi(X, Y, theta, quant = NA, method)
Arguments
| X | Array or DataFrame containing covariates | 
| Y | Array or DataFrame of outcomes | 
| theta | parameter theta | 
| quant | quantile for quantile estimation | 
| method | indicates the method to be used for M-estimation. Options include "mean", "quantile", "ols", "logistic", and "poisson". | 
Value
esimating equation
Simulate the data for testing the functions
Description
sim_data function for the calculation of the matrix A
Usage
sim_data(r = 0.9, binary = FALSE)
Arguments
| r | imputation correlation | 
| binary | simulate binary outcome or not | 
Value
simulated data