Digital Object Identifier

	10.14456/tjst.2020.62 Efficiency Comparison in Replace Missing Value Using Regression Imputation, Multiple Imputation and Expectation Maximization for Classification in Data Mining
รหัสดีโอไอ	10.14456/tjst.2020.62
Creator	Theeridsara Ngernwilai
Title	Efficiency Comparison in Replace Missing Value Using Regression Imputation, Multiple Imputation and Expectation Maximization for Classification in Data Mining
Contributor	Doungkaew Hunthong, Saichon Sinsomboonthong
Publisher	Thammasat University
Publication Year	2563
Journal Title	Thai Journal of Science and Technology
Journal Vol.	9
Journal No.	5
Page no.	575-588
Keyword	missing value, regression imputation, multiple imputation, eXpectation maximization, K-nearest neighbor, decision tree, artificial neural network, support vector machine
URL Website	https://www.tci-thaijo.org/
Website title	THAIJO
ISSN	2286-7333
Abstract	The objective of this research was to compare the efficiencies of three missing value replacement methods, i.e. regression imputation, multiple imputation, and expectation maximization using four classification methods including K-nearest neighbor, decision tree, artificial neural network and support vector machine, on six datasets with some missing values. The tested datasets were the followings: a dataset of liver disease in Andhra Pradesh, India, and a dataset of biopsy data on breast cancer patients, which had the least amount of missing value; a dataset of monoclonal gammopathy data, and a dataset of issued and non-issued credit cards by a bank, which had a moderate amount of missing value; and a dataset of single family loan-level and a dataset of cardiovascular disease in Framingham, Massachusetts, which had the highest amount of missing value. By offered in SPSS software program, the metrics that indicated the efficiency of a classification method were its accuracy, mean squared error and mean absolute error. Each of these data sets was divided into three proportions in the ratio of 70 : 20 : 10. By using the data part 1, training data are used to create a model 70 percentages. For the data part 2, validation data are used to evaluate an error a model 20 percentages and the data part 3, testing data are used to test a model 10 percentages using the random seeds of 10, 20, 30, 40, and 50 by WEKA program. For the classification of the dataset of liver disease in Andhra Pradesh, India, the best method was the support vector machine method by the regression imputation method, multiple imputation method and expectation maximization method. For the classification of the dataset of biopsy data on breast cancer patients, the best method was the support vector machine method by the regression imputation method and expectation maximization method. For the classification of the dataset of monoclonal gammopathy data, the best method was the artificial neural network method by the multiple imputation method. For the classification of the dataset of issued and non-issued credit cards by a bank, the best method was the support vector machine method by the expectation maximization method. For the classification of the dataset of single-family loan-level, the best method was the decision tree method by the multiple imputation method. For the classification of the dataset of cardiovascular disease in Framingham, Massachusetts, the best method was the support vector machine method by the regression imputation method, multiple imputation method and expectation maximization method.

Thai Journal of Science and Technology

บรรณานุกรม

EndNote

APA

Theeridsara Ngernwilai และผู้แต่งคนอื่นๆ. (2020) Efficiency Comparison in Replace Missing Value Using Regression Imputation, Multiple Imputation and Expectation Maximization for Classification in Data Mining. Thai Journal of Science and Technology, 9(5), 575-588. 10.14456/tjst.2020.62

Chicago

Theeridsara Ngernwilai และผู้แต่งคนอื่นๆ. "Efficiency Comparison in Replace Missing Value Using Regression Imputation, Multiple Imputation and Expectation Maximization for Classification in Data Mining". Thai Journal of Science and Technology 9 (2020):575-588. 10.14456/tjst.2020.62

MLA

Theeridsara Ngernwilai และผู้แต่งคนอื่นๆ. Efficiency Comparison in Replace Missing Value Using Regression Imputation, Multiple Imputation and Expectation Maximization for Classification in Data Mining. Thammasat University:ม.ป.ท. 2020. 10.14456/tjst.2020.62

ดิจิตอลไฟล์

Digital File

บรรณานุกรม

APA

Chicago

MLA

ดิจิตอลไฟล์

ไม่สามารถแสดงตัวอย่างไฟล์ได้