|
Comparison of Sampling Techniques for Imbalanced Data Classification |
|---|---|
| รหัสดีโอไอ | |
| Creator | Karn Nasritha |
| Title | Comparison of Sampling Techniques for Imbalanced Data Classification |
| Contributor | Kittisak Kerdprasop, Nittaya Kerdprasop |
| Publisher | Faculyt of Informatics, Mahasarakham University |
| Publication Year | 2561 |
| Journal Title | Journal of Applied Informatics and Technology |
| Journal Vol. | 1 |
| Journal No. | 1 |
| Page no. | 20-37 |
| Keyword | Imbalance Data, SMOTE, Resample, Classification, Ensemble Technique |
| URL Website | https://ph01.tci-thaijo.org/index.php/jait/article/view/90569 |
| Website title | Journal of Applied Informatics and Technology |
| ISSN | 2586-8136 |
| Abstract | Imbalanced data is a problem in the machine learning process for data classification, which results in low classification efficiency. It has also been found that random sampling techniques are used in several ways for solving low performance problems due to data imbalances. This research aims to compare sampling techniques for imbalanced data classification. The research was conducted on three data sets, which are Synthetic minority over-sampling technique, under-sampling technique and resample techniques for Imbalanced data preprocessing. Decision Tree, cart, random forest, support vector machine and artificial neural network algorithms are ensembled with adaboost and bagging algorithms to create models for data classification. Ten-fold cross validation was used to measure model performance. Performance was measured with precision, recall and f-measure. The results showed that resample techniques could improve the imbalanced data better than synthetic minority over-sampling technique. In addition, it was found that the random forest model, the adaboost ensemble with random forest model and the bagging ensemble with random forest model were efficient for data classification in this research. |