|
A NOVEL AND EFFICIENT METHOD FOR ROMAN TO URDU TRANSLITERATION VIA HEURISTICS-BASED SEARCHING ON PARSE TREES |
|---|---|
| รหัสดีโอไอ | |
| Creator | Sayam Qazi, Humera Tariq |
| Title | A NOVEL AND EFFICIENT METHOD FOR ROMAN TO URDU TRANSLITERATION VIA HEURISTICS-BASED SEARCHING ON PARSE TREES |
| Contributor | - |
| Publisher | TuEngr Group |
| Publication Year | 2562 |
| Journal Title | International Transaction Journal of Engineering, Management, & Applied Sciences & Technologies |
| Journal Vol. | 10 |
| Journal No. | 4 |
| Page no. | 567-577 |
| Keyword | Emission Frequencies, Terminal Frequencies, Transitional Frequencies, Zipf Law, Hash Map, Heat Map, Roman Urdu, Path Unfolding, Depth First Search. |
| URL Website | https://tuengr.com/Vol10_4.html |
| Website title | ITJEMAST V10(4) 2019 @ TuEngr.com |
| ISSN | 2228-9860 |
| Abstract | Roman text still forms the significant part of Urdu Data on Internet and there exist ample room for improvement particularly in this domain of Natural Language Processing (NLP). Existing Systems for Roman to Urdu Transliteration possesses their own strengths but work still need to be done to improve their performance. The objective of this particular research is to build a reliable Roman to Urdu Transliteration Batch Processing System with least number of manual corrections required at the user end, thus enhancing the efficiency and reliability of existing and proposed transliteration systems. Parse Tree, Transliteration Tree and novel Heuristic function have been proposed by observing key characteristic of Roman Urdu language. The work has been concluded by giving a benchmark of the proposed solution in terms of computational complexity, performance, and accuracy. Correct transliteration with high score has been found up to 78%, with a low score they found to be 21% while the wrong transliteration would be only 0.53% for all tested word. Some limitations of the algorithms which are: (1) Sometimes it gets the translation correct but ranks are too low to be within the tolerance. This can be mitigated by using a better heuristic function. (2) Sometimes it generates too many correct translations which are in principle correct but invalid when considering the context. |