Digital Object Identifier

	10.14456/ijset.2024.11 Comparison of ChatGPT and Gemini AI in Answering Higher-Order Thinking Skill Biology Questions: Accuracy and Evaluation
รหัสดีโอไอ	10.14456/ijset.2024.11
Creator	Thoriqi Firdaus
Title	Comparison of ChatGPT and Gemini AI in Answering Higher-Order Thinking Skill Biology Questions: Accuracy and Evaluation
Contributor	Siti Aminatus Sholeha, Miftahul Jannah, Andre Ramdani Setiawan
Publisher	Science Education Association (Thailand)
Publication Year	2567
Journal Title	International Journal of Science Education and Teaching
Journal Vol.	3
Journal No.	3
Page no.	126-138
Keyword	comparison, ChatGPT, Gemini AI, HOTs, Biology
URL Website	https://so07.tci-thaijo.org/index.php/IJSET
Website title	International Journal of Science Education and Teaching (IJSET)
ISSN	2821-9163
Abstract	AI is becoming increasingly prevalent and advancing over time, yet the accuracy of this intelligent technology remains a subject of scrutiny. This study aims to provide an in-depth evaluation of the capabilities of two platforms, ChatGPT and Gemini AI, by analyzing and comparing their performance, assessing answer accuracy, and offering comprehensive recommendations. A quantitative comparative approach was employed to evaluate the performance of ChatGPT and Gemini AI in answering Higher-Order Thinking Skills (HOTS) questions. The questions utilized were HOTS-based items on the subject of biology. The analysis shows that ChatGPT's accuracy rate (55%) is slightly higher than Gemini AI's (50%). However, Gemini AI's average score (0.5) is higher than ChatGPT's (0.4), meaning Gemini AI gives overall more accurate answers, even though its percentage of correct responses is lower. This difference is likely due to the types of questions and specific cognitive aspects involved. ChatGPT demonstrated strengths in questions requiring analysis and evaluation, while Gemini performed better in creation-based questions. Both systems faced challenges with questions that integrated complex cognitive processes and procedural knowledge, highlighting opportunities for further improvement in their respective knowledge-processing algorithms. The standard deviations for ChatGPT and Gemini are nearly identical, at 0.5026 and 0.5130, respectively, indicating a comparable level of consistency in the responses of both models. The mean standard error for ChatGPT (0.1124) is slightly lower than that of Gemini (0.1147), suggesting that ChatGPT's mean estimates are marginally more stable. This study highlights that ChatGPT and Gemini AI exhibit distinct strengths and weaknesses in answering Higher-Order Thinking Skills (HOTS) questions. ChatGPT excelled in cognitive dimensions involving analysis (C4) and factual knowledge, providing detailed and comprehensive answers. In contrast, Gemini AI demonstrated an advantage in the creation dimension (C6) and tasks requiring concise, straightforward responses, such as producing or planning solutions.

Science Education Association (Thailand)

บรรณานุกรม

EndNote

APA

Thoriqi Firdaus และ Siti Aminatus Sholeha; Miftahul Jannah; Andre Ramdani Setiawan. (2024) Comparison of ChatGPT and Gemini AI in Answering Higher-Order Thinking Skill Biology Questions: Accuracy and Evaluation. International Journal of Science Education and Teaching, 3(3), 126-138. 10.14456/ijset.2024.11

Chicago

Thoriqi Firdaus และ Siti Aminatus Sholeha; Miftahul Jannah; Andre Ramdani Setiawan. "Comparison of ChatGPT and Gemini AI in Answering Higher-Order Thinking Skill Biology Questions: Accuracy and Evaluation". International Journal of Science Education and Teaching 3 (2024):126-138. 10.14456/ijset.2024.11

MLA

Thoriqi Firdaus และ Siti Aminatus Sholeha; Miftahul Jannah; Andre Ramdani Setiawan. Comparison of ChatGPT and Gemini AI in Answering Higher-Order Thinking Skill Biology Questions: Accuracy and Evaluation. Science Education Association (Thailand):ม.ป.ท. 2024. 10.14456/ijset.2024.11