
LDlink is an interactive and powerful suite of web-based tools for querying germline variants in human population groups of interest to generate interactive tables and plots. All population genotype data originates from Phase 3 (Version 5) of the 1000 Genomes Project and variant RS (reference SNP) numbers are indexed based on dbSNP build 155.
LDlinkR is an R package developed to query and download results (internet access required) generated by LDlink web-based applications from the R console. It facilitates researchers who are interested in performing batch queries. LDlinkR accelerates genomic research by providing efficient and user-friendly functions to programmatically interrogate pairwise linkage disequilibrium from large lists of genetic variants.
Please see the online LDlink documentation for more information about understanding linkage disequilibrium (LD) and additional details about how LDlink calculates patterns of LD across a variety of ancestral human populations.
install.packages("LDlinkR")remotes package:install.packages("remotes")
remotes::install_github("CBIIT/LDlinkR")LDlinkR depends on the following packages:
Following installation, attach the LDlinkR package with:
library(LDlinkR)In order to access the LDlink API via LDlinkR, we use a personal access token. This is a common convention followed by many APIs and emulates the more familiar HTTPS username/password or SSH keys.
You will need to:
LDhap(snps = c("rs3", "rs4", "rs148890987"), 
      pop = "YRI", 
      token = "YourTokenHere123",
      genome_build = "grch38")| Function | Description | 
|---|---|
| LDexpress | Determine if a list of genomic variants is associated with gene expression in tissues of interest. | 
| LDhap | Calculates population specific haplotype frequencies of all haplotypes observed for a list of query variants. | 
| LDmatrix | Generates a data frame of pairwise linkage disequilibrium statistics. | 
| LDpair | Investigates potentially correlated alleles for a pair of variants. | 
| LDpop | Investigates allele frequencies and linkage disequilibrium patterns across 1000 Genomes Project populations. | 
| LDproxy | Explore proxy and putative functional variants for a single query variant. | 
| LDproxy_batch | Query LDproxyusing a list of
query variants. | 
| LDtrait | Search the GWAS Catalog (data updated nightly) to determine if a list of variants (or variants in LD with those variants) have been previously associated with a trait or disease. | 
| SNPchip | Find commercial genotyping chip arrays for variants of interest. | 
| SNPclip | Prune a list of variants by linkage disequilibrium. | 
| Utility Function | Description | 
|---|---|
| list_chips | Provides a data frame listing the names and abbreviation codes for available commercial SNP Chip Arrays from Illumina and Affymetrix. | 
| list_pop | Provides a data frame listing the available reference populations from the 1000 Genomes Project. | 
| list_gtex_tissues | Provides a data frame listing the GTEx
full names, LDexpressfull names (without spaces) and
acceptable abbreviation codes of the 54 non-diseased tissue sites
collected for the GTEx Portal
and used as input for theLDexpressfunction. | 
In this basic example, the LDproxy function is used to
explore proxy and putative functional variants for a single query
variant. Usage by other functions is similar.
my_proxies <- LDproxy(snp = "rs456", 
                      pop = "YRI", 
                      r2d = "r2", 
                      token = "YourTokenHere123",
                      genome_build = "grch38"
                     )This example uses a single reference SNP ID (rsID) for the query
variant, a population of interest (YRI = Yoruba in Ibadan, Nigeria),
“r2” for the desired output to be based on estimated R2, and
genome build GRCH38 (hg38). The output is stored in the variable
my_proxies. Note: Replace
“YourTokenHere123” with your personal access token. See section above,
“Personal Access Token”.
The output can be viewed by using the R Utils Package
head function to return the first parts of the object
my_proxies.
head(my_proxies)##    RS_Number         Coord Alleles    MAF Distance Dprime     R2 Correlated_Alleles
## 1 rs58333091 chr7:24922800   (G/C) 0.1963        0      1 1.0000            G=G,C=C
## 2 rs60614713 chr7:24922807   (T/C) 0.1963        7      1 1.0000            G=T,C=C
## 3 rs59826225 chr7:24925014   (G/T) 0.1963     2214      1 1.0000            G=G,C=T
## 4      rs123 chr7:24926827   (C/A) 0.1963     4027      1 1.0000            G=C,C=A
## 5 rs10341080 chr7:24920084   (C/T) 0.2056    -2716      1 0.9434            G=C,C=T
## 6 rs56794736 chr7:24919358   (C/T) 0.2056    -3442      1 0.9434            G=C,C=T
##   RegulomeDB Function
## 1          4     <NA>
## 2         2b     <NA>
## 3          4     <NA>
## 4         1f     <NA>
## 5         3a     <NA>
## 6          7     <NA>This example demonstrates the use of the LDexpress
function to search if a genomic variant (or list of variants) is
associated with gene expression in tissues of interest. Usage by other
functions is similar.
my_output <- LDexpress(snps = "rs4",
                       pop = c("YRI", "CEU"),
                       tissue =  c("ADI_SUB", "ADI_VIS_OME"),
                       token = "YourTokenHere123"
                      )For the function arguments, this example uses a single rsID for a
query variant, multiple populations (e.g., YRI = Yoruba in Ibadan,
Nigeria and CEU = Utah Residents from North and West Europe) and
multiple tissue types using acceptable abbreviations for available
tissues (e.g., ADI_SUB = Adipose - Subcutaneous and ADI_VIS_OME =
Adipose - Visceral (Omentum)). The output is stored in the variable
my_output. Note: Replace
“YourTokenHere123” with your personal access token. See section above,
“Personal Access Token”.
In order to view the output, use the R Utils Package
head function to return the first parts of the object
my_output.
head(my_output)##   Query      RS_ID       Position                R2                D'
## 1   rs4 rs10637519 chr13:32430479 0.174249321651574 0.965976331360947
## 2   rs4 rs10637519 chr13:32430479 0.174249321651574 0.965976331360947
## 3   rs4   rs473641 chr13:32431244 0.174249321651574 0.965976331360947
## 4   rs4   rs473641 chr13:32431244 0.174249321651574 0.965976331360947
## 5   rs4   rs671746 chr13:32431263 0.174249321651574 0.965976331360947
## 6   rs4   rs671746 chr13:32431263 0.174249321651574 0.965976331360947
##    Gene_Symbol        Gencode_ID                       Tissue
## 1 RP1-257C22.2 ENSG00000279314.1       Adipose - Subcutaneous
## 2 RP1-257C22.2 ENSG00000279314.1 Adipose - Visceral (Omentum)
## 3 RP1-257C22.2 ENSG00000279314.1       Adipose - Subcutaneous
## 4 RP1-257C22.2 ENSG00000279314.1 Adipose - Visceral (Omentum)
## 5 RP1-257C22.2 ENSG00000279314.1       Adipose - Subcutaneous
## 6 RP1-257C22.2 ENSG00000279314.1 Adipose - Visceral (Omentum)
##   Non_effect_Allele_Freq Effect_Allele_Freq Effect_Size     P_value
## 1                G=0.565          GTC=0.435    0.225642  2.2578e-07
## 2                G=0.565          GTC=0.435    0.207161  1.0227e-05
## 3                A=0.565            G=0.435    0.225642  2.2578e-07
## 4                A=0.565            G=0.435    0.207161  1.0227e-05
## 5                C=0.565            T=0.435    0.226558 1.93289e-07
## 6                C=0.565            T=0.435    0.207161  1.0227e-05The following example demonstrates the usage of the utility function
list_pop which returns a listing of the available reference
populations from the 1000 Genomes Project and their corresponding
population code and super population code used by LDlinkR
functions. Usage of the other utility functions is similar.
list_pop()##    pop_code super_pop_code                                  pop_name
## 1       ALL            ALL                           ALL POPULATIONS
## 2       AFR            AFR                                   AFRICAN
## 3       YRI            AFR                  Yoruba in Ibadan, Nigera
## 4       LWK            AFR                    Luhya in Webuye, Kenya
## 5       GWD            AFR                 Gambian in Western Gambia
## 6       MSL            AFR                     Mende in Sierra Leone
## 7       ESN            AFR                            Esan in Nigera
## 8       ASW            AFR   Americans of African Ancestry in SW USA
## 9       ACB            AFR           African Carribbeans in Barbados
## 10      AMR            AMR                         AD MIXED AMERICAN
## 11      MXL            AMR    Mexican Ancestry from Los Angeles, USA
## 12      PUR            AMR            Puerto Ricans from Puerto Rico
## 13      CLM            AMR        Colombians from Medellin, Colombia
## 14      PEL            AMR                 Peruvians from Lima, Peru
## 15      EAS            EAS                                EAST ASIAN
## 16      CHB            EAS              Han Chinese in Bejing, China
## 17      JPT            EAS                  Japanese in Tokyo, Japan
## 18      CHS            EAS                      Southern Han Chinese
## 19      CDX            EAS       Chinese Dai in Xishuangbanna, China
## 20      KHV            EAS         Kinh in Ho Chi Minh City, Vietnam
## 21      EUR            EUR                                  EUROPEAN
## 22      CEU            EUR Utah Residents from North and West Europe
## 23      TSI            EUR                         Toscani in Italia
## 24      FIN            EUR                        Finnish in Finland
## 25      GBR            EUR           British in England and Scotland
## 26      IBS            EUR               Iberian population in Spain
## 27      SAS            SAS                               SOUTH ASIAN
## 28      GIH            SAS  Gujarati Indian from Houston, Texas, USA
## 29      PJL            SAS             Punjabi from Lahore, Pakistan
## 30      BEB            SAS                   Bengali from Bangladesh
## 31      STU            SAS              Sri Lankan Tamil from the UK
## 32      ITU            SAS                 Indian Telugu from the UKMore detailed examples demonstrating the usage of each function can be found in the package vignette.
browseVignettes("LDlinkR")Timothy A. Myers, Stephen J. Chanock and Mitchell J. Machiela