ScienceQA experimental results.
<div><p>This paper presents a novel approach to enhancing educational question-answering (Q&A) systems by combining Retrieval-Augmented Generation (RAG) with Large Language Model (LLM) Code Interpreters. Traditional educational Q&A systems face challenges in areas such as knowled...
Saved in:
| Main Author: | |
|---|---|
| Other Authors: | |
| Published: |
2025
|
| Subjects: | |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | <div><p>This paper presents a novel approach to enhancing educational question-answering (Q&A) systems by combining Retrieval-Augmented Generation (RAG) with Large Language Model (LLM) Code Interpreters. Traditional educational Q&A systems face challenges in areas such as knowledge updates, reasoning accuracy, and the handling of complex computational tasks. These limitations are particularly evident in domains requiring multi-step reasoning or access to real-time, domain-specific knowledge. To address these issues, we propose a system that utilizes RAG to dynamically retrieve up-to-date, relevant information from external knowledge sources, thus mitigating the common “hallucination” problem in LLMs. Additionally, the integration of an LLM Code Interpreter enables the system to perform multi-step logical reasoning and execute Python code for precise calculations, significantly improving its ability to solve mathematical problems and handle complex queries. We evaluated our proposed system on five educational datasets—AI2_ARC, OpenBookQA, E-EVAL, TQA, and ScienceQA—which represent diverse question types and domains. Compared to vanilla Large Language Models (LLMs), our approach combining Retrieval-Augmented Generation (RAG) with Code Interpreters achieved an average accuracy improvement of 10−15 percentage points. Among tested models, GPT-4o and Gemini-pro-1.5 consistently showed the strongest performance, excelling particularly in scientific reasoning, multi-step computations. Despite these advancements, we identify several challenges that remain, including knowledge retrieval failures, code execution errors, difficulties in synthesizing cross-disciplinary information, and limitations in multi-modal reasoning, particularly when combining text and images. These challenges provide important directions for future research aimed at further optimizing educational Q&A systems. Our work shows that integrating RAG and Code Interpreters offers a promising path toward more accurate, transparent, and personalized educational Q&A systems, and can significantly improve the learning experience in various educational contexts.</p></div> |
|---|