|
Adjusting the Imbalanced Data with 5 Classification Methods |
|---|---|
| รหัสดีโอไอ | |
| Creator | Saichon Sinsomboonthong |
| Title | Adjusting the Imbalanced Data with 5 Classification Methods |
| Contributor | Achara Phaeobang |
| Publisher | Thammasat University |
| Publication Year | 2563 |
| Journal Title | Thai Journal of Science and Technology |
| Journal Vol. | 9 |
| Journal No. | 4 |
| Page no. | 418-435 |
| Keyword | imbalanced data, k-nearest neighbor, artificial neural network, support vector machine, rule-based, stochastic gradient descent |
| URL Website | https://www.tci-thaijo.org/ |
| Website title | THAIJO |
| ISSN | 2286-7333 |
| Abstract | We compared the imbalanced data of four methods; i.e. over sampling, synthetic minority over sampling technique, under sampling, and hybrid methods, using five classification methods; i.e. k-nearest neighbor, artificial neural network, support vector machine, rule-based, and stochastic gradient descent. Metrics were accuracy, sensitivity, specificity, mean square error and mean absolute error. The data sets were chemotherapy for stage B/C colon cancer, monoclonal gammopathy and treatment of migraine headaches. Each of these data sets was divided into three proportions in the ratio of 70:20:10 using the data part 1. Training data are used to create a model 70 percentages; the data part 2. Validation data are used to evaluate an error a model 20 percentages, and the data part 3, testing data are used to test a model 10 percentages using the random seed 10, 20, 30, 40, and 50 by WEKA program. When we compared the chemotherapy for stage B/C colon cancer data set, the monoclonal gammopathy data sets, and the treatment of migraine headaches data sets, the best method was the ruled-based in imbalanced data adapting the synthetic minority over sampling technique. |