Extending language models with term proximity weight to utilize term set relation in information retrieval
รหัสดีโอไอ
Title Extending language models with term proximity weight to utilize term set relation in information retrieval
Creator Sompong Kittinaradorn
Contributor Athasit Surarerks, Nakornthip Prompoon
Publisher Chulalongkorn University
Publication Year 2549
Keyword Information retrieval
Abstract This research work is aimed at improving the performance of ad hoc information retrieval via a novel method to compute query term weights on the assumption that terms can be grouped by concepts, as against the conventional practice that terms are independent of one another. The new method is based on the approach that the importance of a term is determined by its contribution to the key concept term of the text. The research introduces a heuristics to group terms by concepts. To visualize it, a graph is plotted with the ordered term positions of a query on the x-axis and the well-known idf weights (Inverse Document Frequency) on the y-axis. Peak terms are classified as concept terms if their idf weights are above a threshold. The highest peak term is the key concept term. Each peak terms are supported by satellite terms on both sides. Between two adjacent peak terms, the term with the lowest idf weight is used to mark a boundary of term sets. Computation is a tree-stepped process: the first to compute the importance of the concept term to the distinct key concept term, the second to estimate the importance of a term in reference to the concept term of the same term set, and the last to compute the importance of the term to the key concept. The calculated weights differ from the idf weight in that the former reflects term importance in the context of a reference concept, i.e. it is a local property, whereas the idf weight is a global property derived from a document collection. In this way, the proposed method can be seen as a context-dependent or concept-determined importance. To test the efficiency of the new term weighting scheme, an experimental design is devised on the hypothesis that a query with concept-dependent weights for its terms would yield better ad hoc information retrieval results. Experiments are conducted within the language modeling framework using query likelihood scoring method and Dirichlet prior smoothing technique. They produce convincing gains for the new approach compared to the baseline and the idf-based results. Improvements are significantly positive on all accounts and are particularly outstanding in the precision area. Using TREC 7 and TREC 8 query sets, the experiments report a 16.12% and 15.74% increases in mean average precision (MAP) respectively. The new method also outperforms the idf-based scheme by 9.10%, and 13.34% for TREC 7 and TREC 8 query sets respectively.
URL Website cuir.car.chula.ac.th
Chulalongkorn University

บรรณานุกรม

EndNote

APA

Chicago

MLA

ดิจิตอลไฟล์

Digital File #1
DOI Smart-Search
สวัสดีค่ะ ยินดีให้บริการสอบถาม และสืบค้นข้อมูลตัวระบุวัตถุดิจิทัล (ดีโอไอ) สำนักการวิจัยแห่งชาติ (วช.) ค่ะ