Exploration of Feature Selection Techniques in Machine Learning Models on HPTLC Images for Rule Extraction

Publication Year2023

0
Citations
108
Usage
0
Captures
0
Mentions
0
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Usage
108
- Downloads
  79
- Abstract Views
  29

Thesis / Dissertation Description

Research related to Biology often utilizes machine learning models that are ultimately uninterpretable by the researcher. It would be helpful if researchers could leverage the same computing power but instead gain specific insight into decision-making to gain a deeper understanding of their domain knowledge. This paper seeks to select features and derive rules from a machine learning classification problem in biochemistry. The specific point of interest is five species of Glycyrrhiza, or Licorice, and the ability to classify them using High-Performance Thin Layer Chromatography (HPTLC) images. These images were taken using HPTLC methods under varying conditions to provide eight unique views of each species. Each view contains 24 samples with varying counts of the individual species. There are a few techniques applied for feature selection and rule extraction. The first two are based on methods recently pioneered and presented as “Binary Encoding of Random Forests” and “Rule Extraction using Sparse Encoding” (Liu 2012). In addition, an independently developed technique called “Interval Extraction and Consolidation” was applied, which was conceptualized due to the particular nature of the dataset. Altogether, these techniques used in consort with standard machine learning models could narrow a feature space from around one-thousand candidates to only ten. These ten most critical features were then used to derive a set of rules for the classification of the five species of licorice. Regarding feature selection, compared to standard model parameter optimization, the Binary Encoding of Random Forests performed similarly, if not much better, in reducing the feature space in almost all cases. Additionally, the application of Interval Extraction and Consolidation excelled in further simplifying the reduced feature space, often by another factor of five to ten. The selected features were then used for relatively simple rule extraction using decision trees, allowing for a more interpretable model.

Bibliographic Details

REPOSITORY URLhttps://egrove.olemiss.edu/hon_thesis/2841

URL IDhttps://egrove.olemiss.edu/hon_thesis/2841; https://egrove.olemiss.edu/cgi/viewcontent.cgi?article=3857&context=hon_thesis

AUTHOR(S)

Bozidar-Brannan Kovachev

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know