![]() |
Approximate String Matching Algorithm using Single Inverted Lists |
---|---|
รหัสดีโอไอ | |
Creator | Soontaree Thumsuwan |
Title | Approximate String Matching Algorithm using Single Inverted Lists |
Contributor | Nuanprang Sangurai, Chouvalit Khancome |
Publisher | Faculty of Informatics, Mahasarakham University |
Publication Year | 2568 |
Journal Title | Journal of Applied Informatics and Technology |
Journal Vol. | 7 |
Journal No. | 2 |
Page no. | 389-405 |
Keyword | String Matching Algorithm, Multiple Character Inverted Lists, Inverted Index, Pattern Matching, Exact String Matching |
URL Website | https://ph01.tci-thaijo.org/index.php/jait |
Website title | Journal of Applied Informatics and Technology |
ISSN | 3088-1803 |
Abstract | Approximate string matching is a fundamental technique in data retrieval that allows for typo errors or misspellings. It is widely applied in databases, search engines, and various applications or online services. To enhance the speed and accuracy of data retrieval, the development of new algorithms remains a significant challenge in computer science research. This paper introduces a novel data structure for approximate search, called the Single Inverted List, which supports a configurable level of error tolerance. Based on this structure, a new approximate string matching algorithm is developed. Theoretical analysis shows that the proposed structure can be constructed with time complexity proportional to the length of the pattern string and requires storage space equal to the sum of the pattern length and the number of distinct characters. The proposed algorithm achieves search performance with time complexity proportional to the product of the text length and the pattern length, while also supporting error-tolerant matching. Experimental results demonstrate that the proposed structure consumes the least memory compared to well-known existing algorithms, and the developed algorithm performs approximate searches efficiently, nearly as fast as the fastest existing methods, while maintaining linear-time performance. |