![]() |
Leveraging PyThaiNLP for Sentiment Analysis of Thai Online Text: A Comparative Study of Logistic Regression and Support Vector Machine |
---|---|
รหัสดีโอไอ | |
Creator | Sunisa Duangtham |
Title | Leveraging PyThaiNLP for Sentiment Analysis of Thai Online Text: A Comparative Study of Logistic Regression and Support Vector Machine |
Contributor | Setthaphong Lertritrungrot, Nattavadee Hongboonmee, Wansuree Massagram |
Publisher | Faculty of Informatics, Mahasarakham University |
Publication Year | 2568 |
Journal Title | Journal of Applied Informatics and Technology |
Journal Vol. | 7 |
Journal No. | 2 |
Page no. | 268-282 |
Keyword | PyThaiNLP, Sentiment Analysis, Thai Online Text |
URL Website | https://ph01.tci-thaijo.org/index.php/jait |
Website title | Journal of Applied Informatics and Technology |
ISSN | 3088-1803 |
Abstract | The objective of this study is to compare the performance of sentiment analysis models for Thai online text using the existing PyThaiNLP libraries. For extracting text from online sources to create a dataset, the text was manually categorized into positive, neutral, and negative sentiments. Data preprocessing involved removing punctuation marks, tokenizing, removing non-Thai characters, and bag of words creation. The data was then divided into training and testing sets to build models using three algorithms: logistic regression, logistic regression with stochastic gradient descent (SGD), and support vector machine (SVM). Upon comparison, the logistic regression model was found to perform the best – achieving accuracy of 80.73% with a 90:10 train-test split using the newmm word tokenization tool and the augmented dictionary. The accuracy for analyzing positive sentiment was 81.10%, for neutral sentiment, 80.16%, and for negative sentiment, 80.97%. |