Proposing Enhanced Feature Engineering and a Selection Model for Machine Learning Processes
Loading...
Authors
Uddin, Muhammad Fahim
Lee, Jeongkyu
Rizvi, Syed S.
Hamada, Samir E.
Issue Date
2018-04-20
Type
Article
Language
en_US
Keywords
Machine learning , Enhanced feature engineering , Parallel processing , Feature optimization , Overfitting , Underfitting , Optimum fitting
Alternative Title
Abstract
Machine Learning (ML) requires a certain number of features (i.e., attributes) to train the model. One of the main challenges is to determine the right number and the type of such features out of the given dataset’s attributes. It is not uncommon for the ML process to use dataset of available features without computing the predictive value of each. Such an approach makes the process vulnerable to overfit, predictive errors, bias, and poor generalization. Each feature in the dataset has either a unique predictive value, redundant, or irrelevant value. However, the key to better accuracy and fitting for ML is to identify the optimum set (i.e., grouping) of the right feature set with the finest matching of the feature’s value. This paper proposes a novel approach to enhance the Feature Engineering and Selection (eFES) Optimization process in ML. eFES is built using a unique scheme to regulate error bounds and parallelize the addition and removal of a feature during training. eFES also invents local gain (LG) and global gain (GG) functions using 3D visualizing techniques to assist the feature grouping function (FGF). FGF scores and optimizes the participating feature, so the ML process can evolve into deciding which features to accept or reject for improved generalization of the model. To support the proposed model, this paper presents mathematical models, illustrations, algorithms, and experimental results. Miscellaneous datasets are used to validate the model building process in Python, C#, and R languages. Results show the promising state of eFES as compared to the traditional feature selection process.
Description
Citation
Uddin, M.F.; Lee, J.; Rizvi, S.; Hamada, S. Proposing Enhanced Feature Engineering and a Selection Model for Machine Learning Processes. Applied Sciences. 2018, 8, 646.
Publisher
MDPI
