Search alternatives:
modular implementation » model implementation (Expand Search), world implementation (Expand Search)
models representing » model representing (Expand Search), models represent (Expand Search), samples representing (Expand Search)
python models » python code (Expand Search), motion models (Expand Search), pelton models (Expand Search)
modular implementation » model implementation (Expand Search), world implementation (Expand Search)
models representing » model representing (Expand Search), models represent (Expand Search), samples representing (Expand Search)
python models » python code (Expand Search), motion models (Expand Search), pelton models (Expand Search)
-
41
PTPC-UHT bounce
Published 2025“…<br>It contains the full Python implementation of the PTPC bounce model (<code>PTPC_UHT_bounce.py</code>) and representative outputs used to generate the figures in the paper. …”
-
42
-
43
-
44
-
45
-
46
-
47
-
48
-
49
-
50
Data features examined for potential biases.
Published 2025“…Representativeness of the population, differences in calibration and model performance among groups, and differences in performance across hospital settings were identified as possible sources of bias.…”
-
51
Analysis topics.
Published 2025“…Representativeness of the population, differences in calibration and model performance among groups, and differences in performance across hospital settings were identified as possible sources of bias.…”
-
52
Datasets To EVAL.
Published 2025“…We evaluated our proposed system on five educational datasets—AI2_ARC, OpenBookQA, E-EVAL, TQA, and ScienceQA—which represent diverse question types and domains. Compared to vanilla Large Language Models (LLMs), our approach combining Retrieval-Augmented Generation (RAG) with Code Interpreters achieved an average accuracy improvement of 10−15 percentage points. …”
-
53
Statistical significance test results.
Published 2025“…We evaluated our proposed system on five educational datasets—AI2_ARC, OpenBookQA, E-EVAL, TQA, and ScienceQA—which represent diverse question types and domains. Compared to vanilla Large Language Models (LLMs), our approach combining Retrieval-Augmented Generation (RAG) with Code Interpreters achieved an average accuracy improvement of 10−15 percentage points. …”
-
54
How RAG work.
Published 2025“…We evaluated our proposed system on five educational datasets—AI2_ARC, OpenBookQA, E-EVAL, TQA, and ScienceQA—which represent diverse question types and domains. Compared to vanilla Large Language Models (LLMs), our approach combining Retrieval-Augmented Generation (RAG) with Code Interpreters achieved an average accuracy improvement of 10−15 percentage points. …”
-
55
OpenBookQA experimental results.
Published 2025“…We evaluated our proposed system on five educational datasets—AI2_ARC, OpenBookQA, E-EVAL, TQA, and ScienceQA—which represent diverse question types and domains. Compared to vanilla Large Language Models (LLMs), our approach combining Retrieval-Augmented Generation (RAG) with Code Interpreters achieved an average accuracy improvement of 10−15 percentage points. …”
-
56
AI2_ARC experimental results.
Published 2025“…We evaluated our proposed system on five educational datasets—AI2_ARC, OpenBookQA, E-EVAL, TQA, and ScienceQA—which represent diverse question types and domains. Compared to vanilla Large Language Models (LLMs), our approach combining Retrieval-Augmented Generation (RAG) with Code Interpreters achieved an average accuracy improvement of 10−15 percentage points. …”
-
57
TQA experimental results.
Published 2025“…We evaluated our proposed system on five educational datasets—AI2_ARC, OpenBookQA, E-EVAL, TQA, and ScienceQA—which represent diverse question types and domains. Compared to vanilla Large Language Models (LLMs), our approach combining Retrieval-Augmented Generation (RAG) with Code Interpreters achieved an average accuracy improvement of 10−15 percentage points. …”
-
58
E-EVAL experimental results.
Published 2025“…We evaluated our proposed system on five educational datasets—AI2_ARC, OpenBookQA, E-EVAL, TQA, and ScienceQA—which represent diverse question types and domains. Compared to vanilla Large Language Models (LLMs), our approach combining Retrieval-Augmented Generation (RAG) with Code Interpreters achieved an average accuracy improvement of 10−15 percentage points. …”
-
59
TQA Accuracy Comparison Chart on different LLM.
Published 2025“…We evaluated our proposed system on five educational datasets—AI2_ARC, OpenBookQA, E-EVAL, TQA, and ScienceQA—which represent diverse question types and domains. Compared to vanilla Large Language Models (LLMs), our approach combining Retrieval-Augmented Generation (RAG) with Code Interpreters achieved an average accuracy improvement of 10−15 percentage points. …”
-
60
ScienceQA experimental results.
Published 2025“…We evaluated our proposed system on five educational datasets—AI2_ARC, OpenBookQA, E-EVAL, TQA, and ScienceQA—which represent diverse question types and domains. Compared to vanilla Large Language Models (LLMs), our approach combining Retrieval-Augmented Generation (RAG) with Code Interpreters achieved an average accuracy improvement of 10−15 percentage points. …”