Dispensing with unnecessary assumptions in population genetics analysis

Citation DatabioRxiv, ISSN: 2692-8205

Publication Year2022

0
Citations
0
Usage
0
Captures
1
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Mentions
1
- News Mentions
  1

Most Recent News

Dispensing with unnecessary assumptions in population genetics analysis (Updated March 1, 2023)

March 10, 2023
NewsRx Life Science Daily

2023 MAR 10 (NewsRx) -- By a News Reporter-Staff News Editor at NewsRx Life Science Daily -- According to news reporting based on a preprint

Article Description

Parametric assumptions in population genetics analysis - including linearity, sources of population stratification and additivity of variance as part of a Gaussian noise - are often made, yet their (approximate) validity depends on variant and traits of interest, as well as genetic ancestry and population dependence structure of the sample cohort. We present a unified statistical workflow, called TarGene, for targeted estimation of effect sizes, as well as two-point and higher-order epistatic interactions of genomic variants on polygenic traits, which dispenses with these unnecessary assumptions. Our approach is founded on Targeted Learning, a framework for estimation that integrates mathematical statistics, machine learning and causal inference. TarGene maximises power whilst simultaneously maximising control over false discoveries by: (i) guaranteeing optimal bias-variance trade-off, (ii) taking into account potential covariate non-linearities, sources of population stratification and dependence structure, and (iii) detecting genetic non-linearities. The necessity of this model-independent approach is demonstrated via extensive simulations. We validate the effectiveness of our method by reproducing previously verified effect sizes on UK Biobank data, whilst simultaneously discovering non-linear effect sizes of additional allelic copies on trait or disease, in a PheWAS study involving 781 traits. Specifically, we demonstrate genetic non-linearity at the FTO locus is significant for 54 traits in this study. We further find three pairs of epistatic loci associated with skin color that have been previously reported to be associated with hair color. Finally, we illustrate how TarGene can be used to investigate higher-order interactions using three variants linked to the vitamin D receptor complex. TarGene provides a platform for comparative analyses across biobanks, or integration of multiple biobanks and heterogeneous populations to simultaneously increase power and control for type I errors, whilst taking into account population stratification and complex dependence structures.

Bibliographic Details

DOI10.1101/2022.09.12.507656

URL IDhttp://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85162292828&origin=inward; http://dx.doi.org/10.1101/2022.09.12.507656; https://dx.doi.org/10.1101/2022.09.12.507656; https://www.biorxiv.org/content/10.1101/2022.09.12.507656v2

AUTHOR(S)

Olivier Labayle Pabet; Kelsey Tetley-Campbell; Chris P. Ponting; Sjoerd Viktor Beentjes; Ava Khamseh; Mark J. van der Laan

PUBLISHER(S)

Cold Spring Harbor Laboratory

TAG(S)

Biochemistry, Genetics and Molecular Biology; Agricultural and Biological Sciences; Immunology and Microbiology; Neuroscience; Pharmacology, Toxicology and Pharmaceutics

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know