PlumX Metrics
Embed PlumX Metrics

Feature Selection Investigation in Machine Learning Docking Scoring Functions

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), ISSN: 1611-3349, Vol: 13954 LNBI, Page: 58-69
2023
  • 1
    Citations
  • 0
    Usage
  • 4
    Captures
  • 0
    Mentions
  • 0
    Social Media
Metric Options:   Counts1 Year3 Year

Metrics Details

Conference Paper Description

The in silico evaluation of small molecules (ligands) and receptors (proteins) interactions is of great importance, especially in Drug Design. This is one of the principal computational methodologies that can be incorporated into the process of proposing new drugs, with the aim of reducing the high financial costs and time involved. In this context, molecular docking is a computer simulation procedure used to predict the best conformation and orientation of a ligand in the binding site of a target protein. These docking algorithms evaluate the protein-ligand complex interactions using scoring functions (SF). SF computationally quantify the complex binding affinity and can be divided into categories according to the methodology applied in their development: Physics-based, Empirical, Knowledge-based and Machine Learning. Machine Learning (ML) scoring functions train the SF considering features obtained from known protein-ligand complexes and experimental affinities. These SF rely heavily on the set of attributes that are used to train them. Thus, in this work, we use PCA, ANOVA and Random Forest to investigate how these feature selection methods impact the performance of three Machine Learning scoring functions trained with Support Vector Machines, Elastic Net Regularization and Neural Networks algorithms. The results show that Neural Networks can greatly benefit from Feature selection performed by Random Forests but not from ANOVA and PCA. The conclusions are that Feature selection can improve the results of regression and in this study Neural Networks combined with Random Forest is the best option.

Bibliographic Details

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know