|
Efficiency Comparison in Missing Value Replacement with Four Classification Methods |
|---|---|
| รหัสดีโอไอ | |
| Creator | Saichon Sinsomboonthong |
| Title | Efficiency Comparison in Missing Value Replacement with Four Classification Methods |
| Publisher | Thammasat University |
| Publication Year | 2563 |
| Journal Title | Thai Journal of Science and Technology |
| Journal Vol. | 9 |
| Journal No. | 5 |
| Page no. | 587-599 |
| Keyword | decision tree, artificial neural network, naive Bayes, binary logistic regression |
| URL Website | https://www.tci-thaijo.org/ |
| Website title | THAIJO |
| ISSN | 2286-7333 |
| Abstract | We compared the missing value replacement efficiency of five methods, i.e. series mean, mean of nearby points, median of nearby points, linear interpolation, and linear trend at point using four classification methods, including decision tree, artificial neural network, naive Bayes and binary logistic regression. Metrics were accuracy, mean square error and mean absolute error. The data sets were heart disease, students' performance in exams and black Friday. Each of these data sets was divided into three proportions in the ratio of 70 : 20 : 10. By using the data part 1, training data are used to create a model 70 percentages. For the data part 2, validation data are used to evaluate an error as a model 20 percentages, and the data part 3, testing data are used to test a model 10 percentages using the random seeds of 10, 20, 30, 40 and 50 by WEKA program. When we compared the heart disease data set, the best classification method was the decision tree in missing value replacement with mean of nearby points. For the students' performance in exams data sets, the best classification method was the binary logistic regression in missing value replacement with linear interpolation. For the black Friday data sets, the best method was the naive Bayes in missing value replacement with median of nearby points. |