| Type: | Package | 
| Title: | Multisource Graph Synthesis with EHR Data | 
| Version: | 0.1.0 | 
| Description: | We develop Multi-source Graph Synthesis (MUGS), an algorithm designed to create embeddings for pediatric Electronic Health Record (EHR) codes by leveraging graphical information from three distinct sources: (1) pediatric EHR data, (2) EHR data from the general patient population, and (3) existing hierarchical medical ontology knowledge shared across different patient populations. See Li et al. (2024) <doi:10.1038/s41746-024-01320-4> for details. | 
| License: | GPL-3 | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| LazyDataCompression: | xz | 
| RoxygenNote: | 7.3.2 | 
| URL: | https://github.com/celehs/MUGS, https://celehs.github.io/MUGS/, https://doi.org/10.1038/s41746-024-01320-4 | 
| Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) | 
| VignetteBuilder: | knitr | 
| Imports: | MASS, Matrix, fastDummies, doSNOW, dplyr, grplasso, foreach, glmnet, grpreg, inline, mvtnorm, pROC, parallel, RcppArmadillo, rsvd, methods | 
| Depends: | R (≥ 3.5.0) | 
| Config/testthat/edition: | 3 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-05-15 03:56:15 UTC; User1 | 
| Author: | Mengyan Li [cre, aut], Thomas Charlon [ctb] (ORCID: 0000-0001-7497-0470), Xiaoou Li [aut], Tianxi Cai [aut], PARSE LTD [aut], CELEHS Team [aut] | 
| Maintainer: | Mengyan Li <mengyanli@bentley.edu> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-05-19 13:40:09 UTC | 
Function Used To Estimate Code Effects
Description
This function estimates code effects using left and right embeddings from source and target sites.
Usage
CodeEff_Matrix(
  S.1,
  S.2,
  n1,
  n2,
  U.1,
  U.2,
  V.1,
  V.2,
  common_codes,
  zeta.int,
  lambda,
  p
)
Arguments
| S.1 | SPPMI from the source site. | 
| S.2 | SPPMI from the target site. | 
| n1 | The number of codes from the source site. | 
| n2 | The number of codes from the target site. | 
| U.1 | The left embeddings left singular vectors times the square root of the singular values from the source site. | 
| U.2 | The left embeddings left singular vectors times the square root of the singular values from the target site. | 
| V.1 | The right embeddings right singular vectors times the square root of the singular values from the source site. | 
| V.2 | The right embeddings right singular vectors times the square root of the singular values from the target site. | 
| common_codes | The list of overlapping codes. | 
| zeta.int | The initial estimator for the code effects. | 
| lambda | The tuning parameter controls the intensity of penalization on the code effect. | 
| p | The length of an embedding. | 
Value
A list with the following elements:
| zeta | The estimated code effects. | 
| dif_F | The Frobenius norm difference between the updated and initial estimators. | 
| V.1.new | Updated right embeddings for the source site. | 
| V.2.new | Updated right embeddings for the target site. | 
Function Used To Estimate Code-Site Effects Parallelly
Description
Function Used To Estimate Code-Site Effects Parallelly
Usage
CodeSiteEff_l2_par(
  S.1,
  S.2,
  n1,
  n2,
  U.1,
  U.2,
  V.1,
  V.2,
  delta.int,
  lambda.delta,
  p,
  common_codes,
  n.common,
  n.core
)
Arguments
| S.1 | SPPMI from the source site | 
| S.2 | SPPMI from the target site | 
| n1 | the number of codes from the source site | 
| n2 | the number of codes from the target site | 
| U.1 | the left embeddings (left singular vectors times the square root of the singular values) from the source site | 
| U.2 | the left embeddings (left singular vectors times the square root of the singular values) from the target site | 
| V.1 | the right embeddings (right singular vectors times the square root of the singular values) from the source site | 
| V.2 | the right embeddings (right singular vectors times the square root of the singular values) from the target site | 
| delta.int | the initial estimator for the code-site effect | 
| lambda.delta | the tuning parameter controls the intensity of penalization on the code-site effects | 
| p | the length of an embedding | 
| common_codes | the list of overlapping codes | 
| n.common | the number of overlapping codes | 
| n.core | the number of cored used for parallel computation | 
Value
The output for the estimation of code-site effects
Function used to generate input data (used only for Simulations) Generate SPPMIs, dummy matrices based on prior group structures, and code-code pairs for tuning and evaluation
Description
Function used to generate input data (used only for Simulations) Generate SPPMIs, dummy matrices based on prior group structures, and code-code pairs for tuning and evaluation
Usage
DataGen_rare_group(
  seed = NULL,
  p,
  n1,
  n2,
  n.common,
  n.group,
  sigma.eps.1,
  sigma.eps.2,
  ratio.delta,
  network.k,
  rho.beta,
  rho.U0,
  rho.delta,
  sigma.rare,
  n.rare,
  group.size
)
Arguments
| seed | for reproducibility | 
| p | the length of an embedding | 
| n1 | the number of codes in site 1 | 
| n2 | the number of codes in site 2 | 
| n.common | common: the number of overlapping codes | 
| n.group | the number of groups | 
| sigma.eps.1 | the sd of error in site 1 | 
| sigma.eps.2 | the sd of error in site 2 | 
| ratio.delta | the proportion of codes in each site that have site-specific effects applied to them | 
| network.k | the number of distinct blocks within each site for which unique inter-code correlations are modeled | 
| rho.beta | AR parameter for the group effects covariance matrix | 
| rho.U0 | AR parameter for the code effects covariance matrix | 
| rho.delta | AR parameter for the code-site effects covariance matrix | 
| sigma.rare | the sd of error for rare codes (usually larger than sigma.eps.1 and sigma.eps.2) | 
| n.rare | The number of rare codes | 
| group.size | the size of each group | 
Value
Returns input data, SPPMIs, dummy matrices based on prior group structures and code-code pairs for tuning and evaluation
Function Used To Estimate Group Effects Parallelly
Description
Function Used To Estimate Group Effects Parallelly
Usage
GroupEff_par(
  S.MGB,
  S.BCH,
  n.MGB,
  n.BCH,
  U.MGB,
  U.BCH,
  V.MGB,
  V.BCH,
  X.MGB.group,
  X.BCH.group,
  n.group,
  name.list,
  beta.int,
  lambda = 0,
  p,
  n.core
)
Arguments
| S.MGB | SPPMI from the source site | 
| S.BCH | SPPMI from the target site | 
| n.MGB | the number of codes from the source site | 
| n.BCH | the number of codes from the target site | 
| U.MGB | the left embeddings (left singular vectors times the square root of the singular values) from the source site | 
| U.BCH | the left embeddings (left singular vectors times the square root of the singular values) from the target site | 
| V.MGB | the right embeddings (right singular vectors times the square root of the singular values) from the source site | 
| V.BCH | the right embeddings (right singular vectors times the square root of the singular values) from the target site | 
| X.MGB.group | the dummy matrix based on prior group structures at the source site | 
| X.BCH.group | the dummy matrix based on prior group structures at the target site | 
| n.group | the number of groups | 
| name.list | the full list of code names from the source site and the target site with repeated names of overlapping codes | 
| beta.int | the initial estimator for the group effects | 
| lambda | the tuning parameter controls the intensity of penalization on the group effect; by default we set it to 0 | 
| p | the length of an embedding | 
| n.core | the number of cored used for parallel computation | 
Value
The output of estimating group effects parallelly
Main function for MUGS algorithm
Description
Main function for MUGS algorithm
Usage
MUGS(
  TUNE = FALSE,
  Eva = TRUE,
  Lambda = c(10),
  Lambda.delta = c(1000),
  n.core = 4,
  tol = 1,
  seed = NULL,
  S.1 = NULL,
  S.2 = NULL,
  X.group.source = NULL,
  X.group.target = NULL,
  pairs.rel.CV = NULL,
  pairs.rel.EV = NULL,
  p = 100,
  n.group = 400,
  outdir = NULL
)
Arguments
| TUNE | Logical value indicating whether the function should tune parameters TRUE or use predefined parameters FALSE. | 
| Eva | Logical value indicating whether to perform evaluation (TRUE) or skip it (FALSE). | 
| Lambda | The candidate values for the tuning parameter controlling the intensity of penalization on the code effects. | 
| Lambda.delta | The candidate values for the tuning parameter controlling the intensity of penalization on the code-site effects. | 
| n.core | Integer specifying the number of cores to use for parallel processing. | 
| tol | Numeric value representing the tolerance level for convergence in the algorithm. | 
| seed | Integer used to set the seed for random number generation, ensuring reproducibility. Set to NULL to disable. | 
| S.1 | The SPPMI matrix from site 1. | 
| S.2 | The SPPMI matrix from site 2. | 
| X.group.source | The dummy matrix representing the group structure of codes at site 1. | 
| X.group.target | The dummy matrix representing the group structure of codes at site 2. | 
| pairs.rel.CV | Code-code pairs used for tuning via cross-validation. | 
| pairs.rel.EV | Code-code pairs used for evaluation. | 
| p | Integer indicating the length of embeddings. | 
| n.group | The number of groups. | 
| outdir | Optional directory to write output files. Defaults to a temporary directory. | 
Value
A list or saved files containing the embedding matrices, similarity matrices, and site-heterogeneous code analysis.
S.1 Dataset
Description
A matrix containing SPPMI data from the source site. This dataset is used as input for analysis in the package.
Usage
S.1
Format
A matrix with 2000 rows and 10 columns:
- Row Names
- Unique identifiers for each row. 
- Columns
- Numeric values representing SPPMI data. 
S.2 Dataset
Description
A matrix containing SPPMI data from the target site. This dataset is used as input for analysis in the package.
Usage
S.2
Format
A matrix with 2000 rows and 10 columns:
- Row Names
- Unique identifiers for each row. 
- Columns
- Numeric values representing SPPMI data. 
U.1 Dataset
Description
A matrix containing left embeddings from the source site. These embeddings are used for embedding-based computations.
Usage
U.1
Format
A matrix with 2000 rows and 10 columns:
- Row Names
- Unique identifiers for each row. 
- Columns
- Numeric values representing embeddings. 
U.2 Dataset
Description
A matrix containing left embeddings from the target site. These embeddings are used for embedding-based computations.
Usage
U.2
Format
A matrix with 2000 rows and 10 columns:
- Row Names
- Unique identifiers for each row. 
- Columns
- Numeric values representing embeddings. 
X.group.source Dataset
Description
A matrix containing group structures at the source site. It represents binary group membership of entities at the source.
Usage
X.group.source
Format
A matrix with 2000 rows and 50 columns:
- Rows
- Entities at the source site. 
- Columns
- Binary values (0 or 1) indicating group membership. 
X.group.target Dataset
Description
A matrix containing group structures at the target site. It represents binary group membership of entities at the target.
Usage
X.group.target
Format
A matrix with 2000 rows and 50 columns:
- Rows
- Entities at the target site. 
- Columns
- Binary values (0 or 1) indicating group membership. 
Download and Load Example Data from Zenodo
Description
Download and Load Example Data from Zenodo
Usage
download_example_data(file, destdir = tempdir())
Arguments
| file | Name of the .Rdata file to download (e.g., "S.1.Rdata"). | 
| destdir | Directory to store the downloaded data. Defaults to a temporary directory. | 
Value
A list containing the loaded dataset.
Function Used For Tuning And Evaluation
Description
Function Used For Tuning And Evaluation
Usage
evaluation.sim(pairs.rel, U, seed = NULL)
Arguments
| pairs.rel | the known code-code pairs | 
| U | the code embedding matrix | 
| seed | Optional integer for reproducibility of sampling. | 
Value
The output of tuning and evaluation
Function For Getting Embedding From SVD
Description
Function For Getting Embedding From SVD
Usage
get_embed(mysvd, d = 2000, normalize = TRUE)
Arguments
| mysvd | the (managed) svd result (adding an element with 'names') | 
| d | dim of the final embedding | 
| normalize | if the output embeddings have l2 norm equal to 1 | 
Value
The embedding from SVD
pairs.rel.CV Dataset
Description
A data frame containing cross-validation pairs for relative comparisons.
Usage
pairs.rel.CV
Format
A data frame with multiple columns:
- col
- Integer representing the column index of a pair. 
- row
- Integer representing the row index of a pair. 
- type
- Character string indicating the type of data (e.g., "train", "test"). 
pairs.rel.EV Dataset
Description
A data frame containing evaluation pairs for relative comparisons.
Usage
pairs.rel.EV
Format
A data frame with multiple columns:
- col
- Integer representing the column index of a pair. 
- row
- Integer representing the row index of a pair. 
- type
- Character string indicating the type of data (e.g., "validation").