| Title: | A Collection of Proteome Panels and Meta-Data | 
| Version: | 0.5 | 
| Date: | 2025-3-5 | 
| Description: | It aggregates protein panel data and metadata for protein quantitative trait locus (pQTL) analysis using 'pQTLtools' (https://jinghuazhao.github.io/pQTLtools/). The package includes data from affinity-based panels such as 'Olink' (https://olink.com/) and 'SomaScan' (https://somalogic.com/), as well as mass spectrometry-based panels from 'CellCarta' (https://cellcarta.com/) and 'Seer' (https://seer.bio/). The metadata encompasses updated annotations and publication details. | 
| License: | MIT + file LICENSE | 
| URL: | https://jinghuazhao.github.io/pQTLdata/, https://jinghuazhao.github.io/pQTLdata/ | 
| Depends: | R (≥ 3.5.0) | 
| Imports: | knitr, Rdpack | 
| RdMacros: | Rdpack | 
| Suggests: | dplyr, grid, EnsDb.Hsapiens.v75, ensembldb, IRanges, org.Hs.eg.db, S4Vectors, VennDiagram | 
| VignetteBuilder: | knitr | 
| LazyData: | Yes | 
| LazyLoad: | Yes | 
| LazyDataCompression: | xz | 
| NeedsCompilation: | no | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.2 | 
| Packaged: | 2025-03-05 16:14:38 UTC; jhz22 | 
| Author: | Jing Hua Zhao | 
| Maintainer: | Jing Hua Zhao <jinghuazhao@hotmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-03-07 11:30:02 UTC | 
A summary of datasets
Description
It aggregates protein panel data and metadata for protein quantitative trait locus (pQTL) analysis using 'pQTLtools' (https://jinghuazhao.github.io/pQTLtools/). The package includes data from affinity-based panels such as 'Olink' (https://olink.com/) and 'SomaScan' (https://somalogic.com/), as well as mass spectrometry-based panels from 'CellCarta' (https://cellcarta.com/) and 'Seer' (https://seer.bio/). The metadata encompasses updated annotations and publication details.
Details
Available data are listed in the following table.
| Objects | Description | 
| Datasets | |
| caprion | Caprion panel | 
| inf1 | Olink/INF panel | 
| Olink_Explore_1536 | Olink/NGS 1472 panels | 
| Olink_Explore_3072 | Olink/Explore 3072 panels | 
| Olink_Explore_HT | Olink/Explore HT panels | 
| Olink_Target_96 | Olink/Target 96 panels | 
| Olink_qPCR | Olink/qPCR panels | 
| SomaScan160410 | SomaScan panel | 
| SomaScanV4.1 | SomaScan v4.1 panel | 
| SomaScan11k | SomaScan 11k panel | 
| scallop_inf1 | SCALLOP/INF meta-analysis results | 
| seer1980 | ST1 from Suhre et al. (2024) bioRxiv | 
| swath_ms | SWATH-MS panel | 
| Installations | |
| EndNote/ | Proteogenomics references | 
| Olink/ | Olink-COVID analysis by MGH | 
Some generic description for the datasets are as follows.
- chr Chromosome. 
- start Start position. 
- end End position. 
- gene Gene name. 
- UniProt UniProt ID. 
Usage
Vignettes on package usage:
- An Overview of pQTLdata. - vignette("pQTLdata").
Author(s)
Jing Hua Zhao in collaboration with other colleagues.
See Also
Useful links:
Examples
# Olink-SomaScan panel overlap
p <- list(setdiff(inf1$uniprot,"P23560"),
          setdiff(SomaScan160410$UniProt[!is.na(SomaScan160410$UniProt)],"P23560"))
cnames <- c("INF1","SomaScan")
os <- VennDiagram::venn.diagram(x = p, category.names=cnames, filename=NULL,
                                disable.logging = TRUE,height=8,width=8,units="in")
grid::grid.newpage()
grid::grid.draw(os)
m <- merge(inf1,SomaScan160410,by.x="uniprot",by.y="UniProt")
u <- setdiff(with(m,unique(uniprot)),"P23560")
o <- subset(inf1,uniprot %in% u)
dim(o)
vars <- c("UniProt","chr","start","end","extGene","Target","TargetFullName")
s <- subset(SomaScan160410[vars], UniProt %in% u)
dim(s)
us <- s[!duplicated(s),]
dim(us)
us
Olink/Explore 1536 panel
Description
Information based on pilot studies
Usage
Olink_Explore_1536
Format
A data frame with 1,472 rows and 3 variables:
- UniProt
- UniProt id 
- Assay
- Experimental assay 
- Panel
- Olink panel 
Details
Curated from R.
Olink/Explore 3072 panels
Description
Information on all qPCR panels
Usage
Olink_Explore_3072
Format
A data frame with 2,945 rows and 4 variables:
- UniProt.ID
- UniProt id 
- Protein.name
- Protein name 
- Gene.name
- Gene name 
- Explore.384.panel
- Explore 384 panel 
Details
Curated from Excel.
Olink/Explore HT panels
Description
Information on all qPCR panels
Usage
Olink_Explore_HT
Format
A data frame with 5,416 rows and 4 variables:
- Olink.ID
- Olink id 
- UniProt.ID
- UniProt id 
- Protein.name
- Protein name 
- Gene.name
- Gene name 
Details
Curated from Excel.
Olink/Target 96 panels
Description
Information on all Target 96 panels. Individual panels are also available from the companion xlsx in the Olink/ directory.
Usage
Olink_Target_96
Format
A data frame with 1,116 rows and 3 variables:
- UniProt
- UniProt id 
- Protein
- Protein 
- Panel
- Panel 
Details
Curated from Excel.
Olink/qPCR panels
Description
Information on all qPCR panels
Usage
Olink_qPCR
Format
A data frame with 1,112 rows and 7 variables:
- UniProt
- UniProt id 
- Panel
- Panels 
- Target
- Protein 
- gene
- HGNC symbol 
- chr
- Chromosome 
- start
- start 
- end
- end 
Details
Curated from Excel.
SomaScan 11k
Description
This is also the latest panel
Usage
SomaScan11k
Format
A data frame with 10,776 rows and 5 variables:
- Sequence.ID
- Sequence ID 
- Full.Name
- Full name 
- Target.Name
- Target name 
- UniProt.ID
- UniProt ID 
- Entrez.Gene.Name
- Entrez gene name 
Details
curated from SomaLogic website.
Source
https://somalogic.com/somascan-11k-assay/
Somascan panel
Description
This is based on panel used in Sun et al. (2018).
Usage
SomaScan160410
Format
A data frame with 5,178 rows and 10 variables:
- SOMAMER_ID
- Somamer id 
- UniProt
- UniProt id 
- Target
- Protein target 
- TargetFullName
- Protein target full name 
- chr
- chromosome (1-22,X,Y) 
- start
- start 
- end
- end 
- entGene
- entrez gene 
- ensGene
- ENSEMBL gene 
- extGene
- external gene 
Details
from the INTERVAL study.
References
Sun BB, Maranville JC, Peters JE, Stacey D, Staley JR, Blackshaw J, Burgess S, Jiang T, Paige E, Surendran P, Oliver-Williams C, Kamat MA, Prins BP, Wilcox SK, Zimmerman ES, Chi A, Bansal N, Spain SL, Wood AM, Morrell NW, Bradley JR, Janjic N, Roberts DJ, Ouwehand WH, Todd JA, Soranzo N, Suhre K, Paul DS, Fox CS, Plenge RM, Danesh J, Runz H, Butterworth AS (2018). “Genomic atlas of the human plasma proteome.” Nature, 558(7708), 73-79. ISSN 1476-4687 (Electronic) 0028-0836 (Linking), doi:10.1038/s41586-018-0175-2.
SomaScan v4.1
Description
This is the 7k panel
Usage
SomaScanV4.1
Format
A data frame with 7,288 rows and 6 variables:
- #
- A serial number 
- SeqID
- SeqID 
- Human.Target.or.Analyte
- Human target/analyte 
- UniProt.ID
- UniProt id 
- GeneID
- HGNC symbol 
- Type
- "Protein" 
Details
obtained directly from SomaLogic.
Caprion panel
Description
Information based on Caprion pilot studies
Usage
caprion
Format
A data frame with 987 rows and 12 variables:
- Gene
- HGNC symbols simplified in four instances 
- Gene.orig
- HGNC symbol 
- Protein
- Protein name as in UniProt 
- Accession
- UniProt id 
- Protein.Description
- Detailed information on protein 
- GO.Cellular.Component
- GO Ceullular component 
- GO.Function
- GO function 
- GO.Process
- GO process 
- ensGenes
- Ensembl genes 
- chrom
- chromosome 
- chr
- chromosome 
- starts
- start positions 
- ends
- end positions 
- start
- minimum start 
- end
- maximum end 
Details
See the Caprion repository involving its use.
Olink/INF1 panel
Description
The panel is based on SCALLOP-INF Zhao et al. (2023).
Usage
inf1
Format
A data frame with 92 rows and 9 variables:
- uniprot
- UniProt id 
- prot
- Protein 
- target
- Protein target name 
- target.short
- Protein target short name 
- gene
- HGNC symbol 
- chr
- chromosome (1-13,16-17,19-22) 
- start
- start 
- end
- end 
- chromosome
- updated chromosomes 
- start38
- start position under build 38 
- end38
- end position under build 38 
- ensGene
- Ensembl gene name 
- ensembl_gene_id
- ENSEMBL gene 
- alt_name
- recent name from www.uniprot.org 
Details
Assembled for SCALLOP-INF
References
Zhao JH, Stacey D, Eriksson N, Macdonald-Dunlop E, Hedman ÅK, Kalnapenkis A, Enroth S, Cozzetto D, Digby-Bell J, Marten J, Folkersen L, Herder C, Jonsson L, Bergen SE, Gieger C, Needham EJ, Surendran P, Team EBR, Paul DS, Polasek O, Thorand B, Grallert H, Roden M, Võsa U, Esko T, Hayward C, Johansson Å, Gyllensten U, Powell N, Hansson O, Mattsson-Carlgren N, Joshi PK, Danesh J, Padyukov L, Klareskog L, Landén M, Wilson JF, Siegbahn A, Wallentin L, Mälarstig A, Butterworth AS, Peters JE (2023). “Genetics of circulating inflammatory proteins identifies drivers of immune-mediated disease risk and therapeutic targets.” Nature Immunology, 24(9), 1540-1551. doi:10.1038/s41590-023-01588-w.
Supplementary table 3
Description
Supplementary information for Zhao et al. (2023).
Usage
scallop_inf1
Format
A data frame with 180 rows and 19 variables:
- UniProt
- UnitProt ID 
- Protein
- Protein name 
- Protein_gene_symbol
- Gene symbol 
- Chromosome
- Chromosome 
- Position
- Position 
- cistrans
- cis/trans 
- rsid
- reference sequence ID 
- Effect_allele
- Effect allele 
- Other_allele
- Eeference allele 
- EAF
- Effect allele frequency 
- b
- b 
- SE
- SE 
- log10P
- log10(P) 
- Direction
- Direction field in METAL output 
- HetISq
- I - ^2
- HetChiSq
- Heterogeneity chi-square 
- HetDf
- degrees of freedom 
- logHetP
- Heterogeneity log10(P) 
- N
- N 
References
Zhao JH, Stacey D, Eriksson N, Macdonald-Dunlop E, Hedman ÅK, Kalnapenkis A, Enroth S, Cozzetto D, Digby-Bell J, Marten J, Folkersen L, Herder C, Jonsson L, Bergen SE, Gieger C, Needham EJ, Surendran P, Team EBR, Paul DS, Polasek O, Thorand B, Grallert H, Roden M, Võsa U, Esko T, Hayward C, Johansson Å, Gyllensten U, Powell N, Hansson O, Mattsson-Carlgren N, Joshi PK, Danesh J, Padyukov L, Klareskog L, Landén M, Wilson JF, Siegbahn A, Wallentin L, Mälarstig A, Butterworth AS, Peters JE (2023). “Genetics of circulating inflammatory proteins identifies drivers of immune-mediated disease risk and therapeutic targets.” Nature Immunology, 24(9), 1540-1551. doi:10.1038/s41590-023-01588-w.
Seer 1980 panel
Description
ST1 from Suhre et al. (2024).
Usage
seer1980
Format
A data frame with 1,980 rows:
- PID.NP
- PID.NP 
- protein_ids
- protein_ids 
- protein_names
- protein_names 
- mapped.UniProtID
- mapped.UniProtID 
- mapped_gene_id
- mapped_gene_id 
- gene_name
- gene_name 
- description
- description 
- chr
- chr 
- start
- start 
- end
- end 
Details
As above.
References
Suhre K, Chen Q, Halama A, Mendez K, Dahlin A, Stephan N, Thareja G, Sarwath H, Guturu H, Dwaraka VB, Batzoglou S, Schmidt F, Lasky-Su JA (2024). “A genome-wide association study of mass spectrometry proteomics using the Seer Proteograph platform.” BioRxiv. doi:10.1101/2024.05.27.596028.
SWATH-MS panel
Description
Curated during INTERVAL pilot study.
Usage
swath_ms
Format
A data frame with 684 rows and 5 variables:
- Accession
- UniProt id 
- accList
- List of UniProt ids 
- uniprotName
- Protein 
- ensGene
- ENSEMBL gene 
- geneName
- HGNC symbol 
Details
As above.