General statistical framework for disease risk prediction by genetic variants, gene expression and image
Page: 1-155
2015
- 41Usage
Metric Options: CountsSelecting the 1-year or 3-year option will change the metrics count to percentiles, illustrating how an article or review compares to other articles or reviews within the selected time period in the same journal. Selecting the 1-year option compares the metrics against other articles/reviews that were also published in the same calendar year. Selecting the 3-year option compares the metrics against other articles/reviews that were also published in the same calendar year plus the two years prior.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Metrics Details
- Usage41
- Abstract Views41
Thesis / Dissertation Description
Fast and more economical next-generation sequencing technologies will generate unprecedentedly massive and highly dimensional data on genomic and epigenomic variation. Medical records will include information on sequenced genomes in the near future. Methods for efficiently extracting biomarkers for risk prediction and treatment selection from millions or dozens of millions of genomic variants pose a significant challenge. Traditional paradigms for identifying variants of clinical validity involve testing associations of the variants. However, even genetic variants with statistically significant associations may or may not be useful for diagnosis or predicting response of disease to treatment. An alternative to association studies for finding genetic variants of predictive utility is to systematically search variants that contain sufficient information to predict phenotype. To achieve this, we introduce the concepts of sufficient dimension reduction (SDR) and coordinate hypothesis, which project the original high dimensional data to very low-dimensional spaces while preserving all information on response phenotypes. We then formulate a clinically significant genetic variant discovery problem into the sparse SDR problem and develop algorithms that can select significant genetic variants from millions of predictors with the aid of a split-and-conquer approach. The sparse SDR is in turn formulated as a sparse optimal scoring problem, but with penalties that can remove row vectors from the basis matrix. To speed up computation, we apply the alternating direction method of multipliers to solving the sparse optimal scoring problem, which can easily be implemented in parallel. To illustrate application of the proposed method, we have applied it to genome-wide association analyses (GWAS) of the datasets on rheumatoid arthritis from the North American Rheumatoid Arthritis Consortium (NARAC) and on psoriasis from the Genetic Association Information Network (GAIN). During the past decade, RNA sequencing (RNA-Seq) that uses deep-sequencing technologies has become a popular platform for gene expression profiling in whole-genome studies. We faced the same challenge with genome-wide association studies because of more than 10 million columns of reads per sample. We convert the RNA-Seq reads to functional principal component analysis (FPCA) scores to reduce the dimension to 10,000-50,000 columns and then use the sparse SDR to search the significant genomic variants for disease. We applied our method to kidney renal clear-cell carcinoma (KIRC) RNA-Seq data from The Cancer Genome Atlas (TCGA) project. Analysis of histologic images is a powerful new approach used to reveal variability among individuals and mechanisms of disease development. A histology image is usually large, containing about 109 pixels. To reduce the dimension and computation complexity, we extended a one-dimensional FPCA function to a two-dimensional FPCA function to extract the significant component factors. We applied this algorithm to KIRC histology image data from TCGA.
Bibliographic Details
Provide Feedback
Have ideas for a new metric? Would you like to see something else here?Let us know