A hybrid approach to Pali Sandhi segmentation using BiLSTM and rule-based analysis
รหัสดีโอไอ
Creator 1. Klangjai Tammanam
2. Nuttachot Promrit
3. Sajjaporn Waijanya
Title A hybrid approach to Pali Sandhi segmentation using BiLSTM and rule-based analysis
Publisher Faculty of Engineering, Khon Kaen University
Publication Year 2564
Journal Title Engineering and Applied Science Research
Journal Vol. 48
Journal No. 5
Page no. 614-626
Keyword BiLSTM, Pali Sandhi, Thai Pali, Rule base, Pali Sandhi splitting
URL Website https://www.tci-thaijo.org/index.php/easr/index
Website title Engineering and Applied Science Research
ISSN 2539-6161
Abstract Pali Sandhi is a phonetic transformation from two words into a new word. The phonemes of the neighbouring words are changed and merged. Pali Sandhi word segmentation is more challenging than Thai word segmentation because Pali is a highly inflected language. This study proposes a novel approach that predicts splitting locations by classifying the sample Sandhi words into five classes with a bidirectional long short-term memory model. We applied the classified rules to rectify the words from the splitting locations. We identified 6,345 Pali Sandhi words from Dhammapada Atthakatha. We evaluated the performance of our proposed model on the basis of the accuracy of the splitting locations and compared the results with the dataset. Results showed that 92.20% of the splitting locations were correct, 1.10% of the Pali Sandhi words were predicted as non-splitting location words and 5.83% were not matched with the answers (incomplete segmentation).
Engineering and Applied Science Research

บรรณานุกรม

EndNote

APA

Chicago

MLA

ดิจิตอลไฟล์

Digital File
DOI Smart-Search
สวัสดีค่ะ ยินดีให้บริการสอบถาม และสืบค้นข้อมูลตัวระบุวัตถุดิจิทัล (ดีโอไอ) สำนักการวิจัยแห่งชาติ (วช.) ค่ะ