Proposing Enhanced Feature Engineering and a Selection Model for Machine Learning Processes

Loading...
Thumbnail Image

Authors

Uddin, Muhammad Fahim
Lee, Jeongkyu
Rizvi, Syed S.
Hamada, Samir E.

Issue Date

2018-04-20

Type

Article

Language

en_US

Keywords

Machine learning , Enhanced feature engineering , Parallel processing , Feature optimization , Overfitting , Underfitting , Optimum fitting

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Machine Learning (ML) requires a certain number of features (i.e., attributes) to train the model. One of the main challenges is to determine the right number and the type of such features out of the given dataset’s attributes. It is not uncommon for the ML process to use dataset of available features without computing the predictive value of each. Such an approach makes the process vulnerable to overfit, predictive errors, bias, and poor generalization. Each feature in the dataset has either a unique predictive value, redundant, or irrelevant value. However, the key to better accuracy and fitting for ML is to identify the optimum set (i.e., grouping) of the right feature set with the finest matching of the feature’s value. This paper proposes a novel approach to enhance the Feature Engineering and Selection (eFES) Optimization process in ML. eFES is built using a unique scheme to regulate error bounds and parallelize the addition and removal of a feature during training. eFES also invents local gain (LG) and global gain (GG) functions using 3D visualizing techniques to assist the feature grouping function (FGF). FGF scores and optimizes the participating feature, so the ML process can evolve into deciding which features to accept or reject for improved generalization of the model. To support the proposed model, this paper presents mathematical models, illustrations, algorithms, and experimental results. Miscellaneous datasets are used to validate the model building process in Python, C#, and R languages. Results show the promising state of eFES as compared to the traditional feature selection process.

Description

Citation

Uddin, M.F.; Lee, J.; Rizvi, S.; Hamada, S. Proposing Enhanced Feature Engineering and a Selection Model for Machine Learning Processes. Applied Sciences. 2018, 8, 646.

Publisher

MDPI

License

Journal

Volume

Issue

PubMed ID

DOI

ISSN

EISSN