HAPNEST: efficient, large-scale generation and evaluation of synthetic datasets for genotypes and phenotypes

Citation DataBioinformatics, ISSN: 1367-4811, Vol: 39, Issue: 9

Publication Year2023

8
Citations
0
Usage
22
Captures
1
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Citations
8
- Citation Indexes
  8
Captures
22
- Readers
  22
Mentions
1
- Blog Mentions
  1

Most Recent Blog

Oblivionum.

August 31, 2023
lens, align

Created with Midjourney V5.2 □ StarSpace: Joint representation learning for retrieval and annotation of genomic interval sets >> https://www.biorxiv.org/content/10.1101/2023.08.21.554131v1 An application of the StarSpace method to convert annotated genomic interval data into low-dimensional distributed vector representations. A system that solves three related information retrieval tasks using emb

Article Description

Motivation: Existing methods for simulating synthetic genotype and phenotype datasets have limited scalability, constraining their usability for large-scale analyses. Moreover, a systematic approach for evaluating synthetic data quality and a benchmark synthetic dataset for developing and evaluating methods for polygenic risk scores are lacking. Results: We present HAPNEST, a novel approach for efficiently generating diverse individual-level genotypic and phenotypic data. In comparison to alternative methods, HAPNEST shows faster computational speed and a lower degree of relatedness with reference panels, while generating datasets that preserve key statistical properties of real data. These desirable synthetic data properties enabled us to generate 6.8 million common variants and nine phenotypes with varying degrees of heritability and polygenicity across 1 million individuals. We demonstrate how HAPNEST can facilitate biobank-scale analyses through the comparison of seven methods to generate polygenic risk scoring across multiple ancestry groups and different genetic architectures.

Bibliographic Details

DOI10.1093/bioinformatics/btad535

PMID37647640

URL IDhttp://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85170582092&origin=inward; http://dx.doi.org/10.1093/bioinformatics/btad535; http://www.ncbi.nlm.nih.gov/pubmed/37647640; https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btad535/7255913; https://dx.doi.org/10.1093/bioinformatics/btad535; https://academic.oup.com/bioinformatics/article/39/9/btad535/7255913

AUTHOR(S)

Sophie Wharrie; Zhiyu Yang; Vishnu Raj; Remo Monti; Rahul Gupta; Ying Wang; Alicia Martin; Luke J O’Connor; Samuel Kaski; Pekka Marttinen; Pier Francesco Palamara; Christoph Lippert; Andrea Ganna; Russell Schwartz

PUBLISHER(S)

Oxford University Press (OUP)

TAG(S)

Mathematics; Biochemistry, Genetics and Molecular Biology; Computer Science

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know