Machine Learning Model Towards Evaluating Data gathering methods in Manufacturing and Mechanical Engineering
Autour(s)
- Mahyar Amini, Koosha Sharifani, Ali Rahmani
Abstract
Supervised Machine Learning (ML) models require extensive training data to properly approximate the behavior of complex mechanical processes and systems. Real-world experiments or adequate simulations are expensive, time-consuming or incident-related and make the efficient acquisition of sample data a compelling necessity. In mechanical engineering and manufacturing, data is usually collected via established Design of Experiments (DOE) methods. At the same time, the topic of Active Learning (AL) is gaining in importance in the research community and promises a reduction in the amount of data, but is rarely used in industry. In this paper, we compare the most common data sampling methods with AL to achieve better predictive results with fewer samples on regression tasks. We propose a novel evaluation framework that allows to compare various sampling methods in a controlled and unbiased manner, regardless of their different requirements. Using three exemplary use cases (UCs), we evaluate when one should use AL or DOE methods for the task of data generation, by looking at the sample efficiency, stability and predictive accuracy of the resulting ML models. This paper provides practical guidance to both engineers and data scientists, who required highly efficient data collection for later use of ML.