TranSentCut ๏ญ transformer based Thai sentence segmentation
รหัสดีโอไอ
Creator 1. Sumeth Yuenyong
2. Virach Sornlertlamvanich
Title TranSentCut ๏ญ transformer based Thai sentence segmentation
Publisher Research and Development Office, Prince of Songkla University
Publication Year 2565
Journal Title Songklanakarin Journal of Science an Technology (SJST)
Journal Vol. 44
Journal No. 3
Page no. 852-860
Keyword sentence segmentation, natural language processing, neural network, transformer model
URL Website https://rdo.psu.ac.th/sjst/index.php
ISSN 0125-3395
Abstract We propose TranSentCut, a sentence segmentation model for Thai based on the transformer architecture. Sentencesegmentation for Thai is a problem because there is no end of sentence marker like in other languages. Existing methods makeuse of POS tags, which is not easy to label and must be done for every word in the data. This limits the the applicability andperformance of sentence segmentation on open-domain text, because the only high-quality Thai corpus that has sentenceboundary and POS labels was constructed mostly from academic articles. Our approach only uses raw text for training and theonly labelling required is to separate each sentence into its own line in a text file. This makes new datasets much easier toconstruct. Comparison with existing methods show that our proposed model is competitive with the most recent state-of-the-artwhen evaluated on in-domain texts, and improved significantly over existing publicly available libraries when applied to out-ofdomain input texts.
Songklanakarin Journal of Science and Technology (SJST)

บรรณานุกรม

EndNote

APA

Chicago

MLA

ดิจิตอลไฟล์

Digital File
DOI Smart-Search
สวัสดีค่ะ ยินดีให้บริการสอบถาม และสืบค้นข้อมูลตัวระบุวัตถุดิจิทัล (ดีโอไอ) สำนักการวิจัยแห่งชาติ (วช.) ค่ะ