A method for learning a sparse classifier in the presence of missing data for high-dimensional biological datasets
Bioinformatics, ISSN: 1460-2059, Vol: 33, Issue: 18, Page: 2897-2905
2017
- 10Citations
- 51Captures
Metric Options: Counts1 Year3 YearSelecting the 1-year or 3-year option will change the metrics count to percentiles, illustrating how an article or review compares to other articles or reviews within the selected time period in the same journal. Selecting the 1-year option compares the metrics against other articles/reviews that were also published in the same calendar year. Selecting the 3-year option compares the metrics against other articles/reviews that were also published in the same calendar year plus the two years prior.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Metrics Details
- Citations10
- Citation Indexes10
- CrossRef10
- 10
- Captures51
- Readers51
- 51
Article Description
Motivation: This work addresses two common issues in building classification models for biological or medical studies: learning a sparse model, where only a subset of a large number of possible predictors is used, and training in the presence of missing data. This work focuses on supervised generative binary classification models, specifically linear discriminant analysis (LDA). The parameters are determined using an expectation maximization algorithm to both address missing data and introduce priors to promote sparsity. The proposed algorithm, expectationmaximization sparse discriminant analysis (EM-SDA), produces a sparse LDA model for datasets with and without missing data. Results: EM-SDA is tested via simulations and case studies. In the simulations, EM-SDA is compared with nearest shrunken centroids (NSCs) and sparse discriminant analysis (SDA) with k-nearest neighbors for imputation for varying mechanism and amount of missing data. In three case studies using published biomedical data, the results are compared with NSC and SDA models with four different types of imputation, all of which are common approaches in the field. EM-SDA is more accurate and sparse than competing methods both with and without missing data in most of the experiments. Furthermore, the EM-SDA results are mostly consistent between the missing and full cases. Biological relevance of the resulting models, as quantified via a literature search, is also presented. Availability and implementation: A Matlab implementation published under GNU GPL v.3 license is available at http://web.mit.edu/braatzgroup/links.html.
Bibliographic Details
http://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85028747007&origin=inward; http://dx.doi.org/10.1093/bioinformatics/btx224; http://www.ncbi.nlm.nih.gov/pubmed/28431087; https://academic.oup.com/bioinformatics/article/33/18/2897/3738496; https://dx.doi.org/10.1093/bioinformatics/btx224
Oxford University Press (OUP)
Provide Feedback
Have ideas for a new metric? Would you like to see something else here?Let us know