Time-series clustering approach for training data selection of a data-driven predictive model: Application to an industrial bio 2,3-butanediol distillation process
Computers & Chemical Engineering, ISSN: 0098-1354, Vol: 161, Page: 107758
2022
- 14Citations
- 24Captures
Metric Options: Counts1 Year3 YearSelecting the 1-year or 3-year option will change the metrics count to percentiles, illustrating how an article or review compares to other articles or reviews within the selected time period in the same journal. Selecting the 1-year option compares the metrics against other articles/reviews that were also published in the same calendar year. Selecting the 3-year option compares the metrics against other articles/reviews that were also published in the same calendar year plus the two years prior.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Article Description
In this study, we propose a time-series clustering approach that selects optimal training data for the development of predictive models. The optimal number of clusters was set based on the variation of within-cluster sums of squares. A predictive model was developed with the selection ratio of training data from each of those clusters. Based on the results, a regression model was developed to predict the performance of the model. The search space was applied to the regression model, and the optimal training data ratio were selected satisfying the objective function and constraints. The effectiveness of the method is demonstrated by addressing a commercial bio 2,3-butanediol distillation process. As a result, the number of data for model training was reduced by 49.20% compared to the base case without clustering. The coefficient of determination (R 2 ) showed the same level of performance, and the root-mean-square error was improved up to 14.07%.
Bibliographic Details
http://www.sciencedirect.com/science/article/pii/S0098135422000990; http://dx.doi.org/10.1016/j.compchemeng.2022.107758; http://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85126548415&origin=inward; https://linkinghub.elsevier.com/retrieve/pii/S0098135422000990; https://dx.doi.org/10.1016/j.compchemeng.2022.107758
Elsevier BV
Provide Feedback
Have ideas for a new metric? Would you like to see something else here?Let us know