Hybrid genetic algorithm-decision tree approach for rate constant prediction using structures of reactants and solvent for Diels-Alder reaction

Citation data:

Computers & Chemical Engineering, ISSN: 0098-1354, Vol: 106, Page: 690-698

Publication Year:
2017
Usage 7
Abstract Views 7
Captures 12
Readers 12
Social Media 23
Shares, Likes & Comments 23
Citations 2
Citation Indexes 2
DOI:
10.1016/j.compchemeng.2017.02.022
Author(s):
Shounak Datta; Vikrant A. Dev; Mario R. Eden
Publisher(s):
Elsevier BV
Tags:
Chemical Engineering; Computer Science
article description
In recent years, Computer-Aided Molecular Design (CAMD) has been extensively used for defining and designing reactions at their maximal potential. In all of these contributions, either the structures of reactants/products have been considered to be unchanging or the solvent structure. Developing a QSPR model which not only captures the influence of reactant structures but also the solvent effect on reaction rate, is essential. Since the structures of reactants and products are related, such QSPR models will serve as a prerequisite for the simultaneous CAMD of reactants, products and solvents. They will also provide a useful tool for predicting the rate constant without relying on experiments. To develop such a QSPR, in our work, the Diels-Alder reaction with different sets of reactants and solvents was investigated. Connectivity indices were used to represent the structures of the members of each set. Principal Component Analysis (PCA) was applied to identify principal components (PCs) corresponding to the structures of reactants and solvent of each set. Linear models expressed in terms of PCs were then generated using a Decision Tree (DT) algorithm such that the R 2 value was maximized. These models formed the initial population on which the GA performed operations such as crossover and mutation to obtain model(s) with best rate constant prediction. Thus, the novelty of our approach is that after feature extraction using PCA, a DT algorithm generates an ensemble of linear models, which through the GA is transformed into a model with best fit. Our approach required much lesser generations to provide a model with highest R 2 ext value as compared to the case where the DT did not initialize the population of models.