Accurate and Scalable Construction of Polygenic Scores in Large Biobank Data Sets

Citation DataThe American Journal of Human Genetics, ISSN: 0002-9297, Vol: 106, Issue: 5, Page: 679-693

Publication Year2020

90
Citations
0
Usage
108
Captures
2
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Citations
90
- Citation Indexes
  90
Captures
108
- Readers
  108
Mentions
2
- News Mentions
  2

Most Recent News

Turning on the 'off switch' in cancer cells

April 20, 2020
Science Daily

A team of scientists has identified the binding site where drug compounds could activate a key braking mechanism against the runaway growth of many types of cancer. The discovery marks a critical step toward developing a potential new class of anti-cancer drugs that enhance the activity of a prevalent family of tumor suppressor proteins, the authors say.

Article Description

Accurate construction of polygenic scores (PGS) can enable early diagnosis of diseases and facilitate the development of personalized medicine. Accurate PGS construction requires prediction models that are both adaptive to different genetic architectures and scalable to biobank scale datasets with millions of individuals and tens of millions of genetic variants. Here, we develop such a method called Deterministic Bayesian Sparse Linear Mixed Model (DBSLMM). DBSLMM relies on a flexible modeling assumption on the effect size distribution to achieve robust and accurate prediction performance across a range of genetic architectures. DBSLMM also relies on a simple deterministic search algorithm to yield an approximate analytic estimation solution using summary statistics only. The deterministic search algorithm, when paired with further algebraic innovations, results in substantial computational savings. With simulations, we show that DBSLMM achieves scalable and accurate prediction performance across a range of realistic genetic architectures. We then apply DBSLMM to analyze 25 traits in UK Biobank. For these traits, compared to existing approaches, DBSLMM achieves an average of 2.03%–101.09% accuracy gain in internal cross-validations. In external validations on two separate datasets, including one from BioBank Japan, DBSLMM achieves an average of 14.74%–522.74% accuracy gain. In these real data applications, DBSLMM is 1.03–28.11 times faster and uses only 7.4%–24.8% of physical memory as compared to other multiple regression-based PGS methods. Overall, DBSLMM represents an accurate and scalable method for constructing PGS in biobank scale datasets.

Bibliographic Details

DOI10.1016/j.ajhg.2020.03.013

PMID32330416

URL IDhttp://www.sciencedirect.com/science/article/pii/S0002929720301099; http://dx.doi.org/10.1016/j.ajhg.2020.03.013; http://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85084139757&origin=inward; http://www.ncbi.nlm.nih.gov/pubmed/32330416; https://linkinghub.elsevier.com/retrieve/pii/S0002929720301099; https://dx.doi.org/10.1016/j.ajhg.2020.03.013

AUTHOR(S)

Yang, Sheng; Zhou, Xiang

PUBLISHER(S)

Elsevier BV

TAG(S)

Biochemistry, Genetics and Molecular Biology; Medicine

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know