![]() |
Web Scraping-based System for E-commerce Price Comparison and Similar Product Segmentation |
---|---|
รหัสดีโอไอ | |
Creator | Pongsin Jankaew |
Title | Web Scraping-based System for E-commerce Price Comparison and Similar Product Segmentation |
Contributor | Wachirawut Thamviset |
Publisher | Faculty of Informatics, Mahasarakham University |
Publication Year | 2568 |
Journal Title | Journal of Applied Informatics and Technology |
Journal Vol. | 7 |
Journal No. | 2 |
Page no. | 346-362 |
Keyword | Agglomerative Clustering, E-commerce, Product Iden-tification, Web Scraping |
URL Website | https://ph01.tci-thaijo.org/index.php/jait |
Website title | Journal of Applied Informatics and Technology |
ISSN | 3088-1803 |
Abstract | With the booming growth of e-commerce, finding the best deals amid a multitude of online shopping websites has become a challenge. Consumers often spend a considerable amount of time manually sifting and comparing data, leading to uncertainty in decision-making. To address this issue, our research proposes a system that utilizes web scraping techniques to identify top deals from multiple e-commerce sites. We have developed Python-based web scraping scripts and incorporated a configuration file for customization, enabling users to extract product data from diverse websites. The system scrapes data and displays result each time the user enters a query, ensuring that the scraped data is up to date. Furthermore, our system enhances the user experience by incorporating product model datasets for product identification, enabling specific searches based on product specifications, and offering recommendations for similar product models. Finally, in cases where products remain unidentified, we introduce a feature for grouping similar products through an agglomerative clustering method. This method utilizes product name and image features extracted by TF-IDF and Convolutional Neural Networks (CNN), allowing for price comparisons among similar products and enhancing the overall shopping experience. Preliminary evaluations show that our system successfully extracts data from target websites with proper customizations. The evaluations of similar product clustering demonstrate that using a combined feature of product names and images significantly improves clustering performance, surpassing the use of product names or images alone, with a 9 percent increase and 18 percent increase, respectively. |