ScienceQA experimental results.

<div><p>This paper presents a novel approach to enhancing educational question-answering (Q&A) systems by combining Retrieval-Augmented Generation (RAG) with Large Language Model (LLM) Code Interpreters. Traditional educational Q&A systems face challenges in areas such as knowled...

Full description

Saved in:

Bibliographic Details
Main Author:	Jin Lu (428513) (author)
Other Authors:	Ji Li (207201) (author)
Published:	2025
Subjects:	Medicine Sociology Space Science Biological Sciences not elsewhere classified Mathematical Sciences not elsewhere classified Information Systems not elsewhere classified solve mathematical problems scienceqa &# 8212 promising path toward large language model identify several challenges handle complex queries hallucination &# 8221 future research aimed complex computational tasks common &# 8220 among tested models 5 consistently showed 15 percentage points 10 &# 8722 execute python code enhancing educational question code execution errors average accuracy improvement code interpreters offers code interpreters achieved external knowledge sources traditional educational q personalized educational q optimizing educational q systems face challenges step logical reasoning approach combining retrieval domains requiring multi code interpreters educational q reasoning accuracy combining retrieval knowledge retrieval step reasoning novel approach step computations combining text specific knowledge knowledge updates scientific reasoning modal reasoning xlink "> work shows thus mitigating synthesizing cross strongest performance significantly improving significantly improve relevant information precise calculations
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	<div><p>This paper presents a novel approach to enhancing educational question-answering (Q&A) systems by combining Retrieval-Augmented Generation (RAG) with Large Language Model (LLM) Code Interpreters. Traditional educational Q&A systems face challenges in areas such as knowledge updates, reasoning accuracy, and the handling of complex computational tasks. These limitations are particularly evident in domains requiring multi-step reasoning or access to real-time, domain-specific knowledge. To address these issues, we propose a system that utilizes RAG to dynamically retrieve up-to-date, relevant information from external knowledge sources, thus mitigating the common “hallucination” problem in LLMs. Additionally, the integration of an LLM Code Interpreter enables the system to perform multi-step logical reasoning and execute Python code for precise calculations, significantly improving its ability to solve mathematical problems and handle complex queries. We evaluated our proposed system on five educational datasets—AI2_ARC, OpenBookQA, E-EVAL, TQA, and ScienceQA—which represent diverse question types and domains. Compared to vanilla Large Language Models (LLMs), our approach combining Retrieval-Augmented Generation (RAG) with Code Interpreters achieved an average accuracy improvement of 10−15 percentage points. Among tested models, GPT-4o and Gemini-pro-1.5 consistently showed the strongest performance, excelling particularly in scientific reasoning, multi-step computations. Despite these advancements, we identify several challenges that remain, including knowledge retrieval failures, code execution errors, difficulties in synthesizing cross-disciplinary information, and limitations in multi-modal reasoning, particularly when combining text and images. These challenges provide important directions for future research aimed at further optimizing educational Q&A systems. Our work shows that integrating RAG and Code Interpreters offers a promising path toward more accurate, transparent, and personalized educational Q&A systems, and can significantly improve the learning experience in various educational contexts.</p></div>

ScienceQA experimental results.

Similar Items