Comparing DeepSeek and GPT-4o in ECG interpretation: Is AI improving over time?

<h3 dir="ltr">Background</h3><p dir="ltr">DeepSeek is a recently launched large language model (LLM), whereas GPT-4o is an advanced ChatGPT version whose electrocardiography (ECG) interpretation capabilities have been previously studied. However, DeepSeek’s perf...

Full description

Saved in:

Bibliographic Details
Main Author:	Serkan GÜNAY (23072548) (author)
Other Authors:	Ahmet ÖZTÜRK (23072551) (author), Anılcan Tahsin KARAHAN (23072554) (author), Mert BARINDIK (23072557) (author), Seval KOMUT (23072560) (author), Yavuz YİĞİT (23072563) (author)
Published:	2025
Subjects:	Biomedical and clinical sciences Cardiovascular medicine and haematology Health sciences Health services and systems Information and computing sciences Artificial intelligence ChatGPT GPT-4o Deep Seek Electrocardiography Emergency medicine
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1864513524480344064
author	Serkan GÜNAY (23072548)
author2	Ahmet ÖZTÜRK (23072551) Anılcan Tahsin KARAHAN (23072554) Mert BARINDIK (23072557) Seval KOMUT (23072560) Yavuz YİĞİT (23072563)
author2_role	author author author author author
author_facet	Serkan GÜNAY (23072548) Ahmet ÖZTÜRK (23072551) Anılcan Tahsin KARAHAN (23072554) Mert BARINDIK (23072557) Seval KOMUT (23072560) Yavuz YİĞİT (23072563)
author_role	author
dc.creator.none.fl_str_mv	Serkan GÜNAY (23072548) Ahmet ÖZTÜRK (23072551) Anılcan Tahsin KARAHAN (23072554) Mert BARINDIK (23072557) Seval KOMUT (23072560) Yavuz YİĞİT (23072563)
dc.date.none.fl_str_mv	2025-11-14T12:00:00Z
dc.identifier.none.fl_str_mv	10.1016/j.hrtlng.2025.08.007
dc.relation.none.fl_str_mv	https://figshare.com/articles/journal_contribution/Comparing_DeepSeek_and_GPT-4o_in_ECG_interpretation_Is_AI_improving_over_time_/31167910
dc.rights.none.fl_str_mv	CC BY 4.0 info:eu-repo/semantics/openAccess
dc.subject.none.fl_str_mv	Biomedical and clinical sciences Cardiovascular medicine and haematology Health sciences Health services and systems Information and computing sciences Artificial intelligence ChatGPT GPT-4o Deep Seek Electrocardiography Emergency medicine
dc.title.none.fl_str_mv	Comparing DeepSeek and GPT-4o in ECG interpretation: Is AI improving over time?
dc.type.none.fl_str_mv	Text Journal contribution info:eu-repo/semantics/publishedVersion text contribution to journal
description	<h3 dir="ltr">Background</h3><p dir="ltr">DeepSeek is a recently launched large language model (LLM), whereas GPT-4o is an advanced ChatGPT version whose electrocardiography (ECG) interpretation capabilities have been previously studied. However, DeepSeek’s performance in this domain remains unexplored. </p><h3 dir="ltr">Objectives</h3><p dir="ltr">This study aims to evaluate DeepSeek’s accuracy in ECG interpretation and compare it with GPT-4o, emergency medicine specialists, and cardiologists. A secondary aim is to assess any performance changes in GPT-4o over one year. </p><h3 dir="ltr">Methods</h3><p dir="ltr">Between February 9 and March 1, 2025, 40 ECG images (20 daily routine, 20 more challenging) from the book 150 ECG Cases were evaluated by both GPT-4o and DeepSeek, each model tested 13 times. The accuracy of their responses was compared with previously collected answers from 12 cardiologists and 12 emergency medicine specialists. GPT-4o’s 2025 performance was compared to its 2024 results on identical ECGs. </p><h3 dir="ltr">Results</h3><p dir="ltr">GPT-4o outperformed DeepSeek with higher median correct answers on daily routine (14 vs. 12), more challenging (13 vs. 10), and total ECGs (27 vs. 22) with statistically significant differences (p=0.048, p<0.001, p<0.001). A moderate agreement was observed between the responses provided by GPT-4o (p<0.001, Fleiss Kappa=0.473), while a substantial agreement was observed in the responses provided by DeepSeek (p<0.001, Fleiss Kappa=0.712). No significant year-over-year improvement was observed in GPT-4o’s performance. </p><h3 dir="ltr">Conclusion</h3><p dir="ltr">This first evaluation of DeepSeek in ECG interpretation reveals its performance is lower than that of GPT-4o and human experts. While GPT-4o demonstrates greater accuracy, both models fall short of expert-level performance, underscoring the need for caution and further validation before clinical integration.</p><h2 dir="ltr">Other Information</h2><p dir="ltr">Published in: Heart & Lung<br>License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.hrtlng.2025.08.007" target="_blank">https://dx.doi.org/10.1016/j.hrtlng.2025.08.007</a></p>
eu_rights_str_mv	openAccess
id	Manara2_6e5e299072878db4a1e39635e588b53a
identifier_str_mv	10.1016/j.hrtlng.2025.08.007
network_acronym_str	Manara2
network_name_str	Manara2
oai_identifier_str	oai:figshare.com:article/31167910
publishDate	2025
repository.mail.fl_str_mv
repository.name.fl_str_mv
repository_id_str
rights_invalid_str_mv	CC BY 4.0
spelling	Comparing DeepSeek and GPT-4o in ECG interpretation: Is AI improving over time?Serkan GÜNAY (23072548)Ahmet ÖZTÜRK (23072551)Anılcan Tahsin KARAHAN (23072554)Mert BARINDIK (23072557)Seval KOMUT (23072560)Yavuz YİĞİT (23072563)Biomedical and clinical sciencesCardiovascular medicine and haematologyHealth sciencesHealth services and systemsInformation and computing sciencesArtificial intelligenceChatGPTGPT-4oDeepSeekElectrocardiographyEmergency medicine<h3 dir="ltr">Background</h3><p dir="ltr">DeepSeek is a recently launched large language model (LLM), whereas GPT-4o is an advanced ChatGPT version whose electrocardiography (ECG) interpretation capabilities have been previously studied. However, DeepSeek’s performance in this domain remains unexplored. </p><h3 dir="ltr">Objectives</h3><p dir="ltr">This study aims to evaluate DeepSeek’s accuracy in ECG interpretation and compare it with GPT-4o, emergency medicine specialists, and cardiologists. A secondary aim is to assess any performance changes in GPT-4o over one year. </p><h3 dir="ltr">Methods</h3><p dir="ltr">Between February 9 and March 1, 2025, 40 ECG images (20 daily routine, 20 more challenging) from the book 150 ECG Cases were evaluated by both GPT-4o and DeepSeek, each model tested 13 times. The accuracy of their responses was compared with previously collected answers from 12 cardiologists and 12 emergency medicine specialists. GPT-4o’s 2025 performance was compared to its 2024 results on identical ECGs. </p><h3 dir="ltr">Results</h3><p dir="ltr">GPT-4o outperformed DeepSeek with higher median correct answers on daily routine (14 vs. 12), more challenging (13 vs. 10), and total ECGs (27 vs. 22) with statistically significant differences (p=0.048, p<0.001, p<0.001). A moderate agreement was observed between the responses provided by GPT-4o (p<0.001, Fleiss Kappa=0.473), while a substantial agreement was observed in the responses provided by DeepSeek (p<0.001, Fleiss Kappa=0.712). No significant year-over-year improvement was observed in GPT-4o’s performance. </p><h3 dir="ltr">Conclusion</h3><p dir="ltr">This first evaluation of DeepSeek in ECG interpretation reveals its performance is lower than that of GPT-4o and human experts. While GPT-4o demonstrates greater accuracy, both models fall short of expert-level performance, underscoring the need for caution and further validation before clinical integration.</p><h2 dir="ltr">Other Information</h2><p dir="ltr">Published in: Heart & Lung<br>License: <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">http://creativecommons.org/licenses/by/4.0/</a><br>See article on publisher's website: <a href="https://dx.doi.org/10.1016/j.hrtlng.2025.08.007" target="_blank">https://dx.doi.org/10.1016/j.hrtlng.2025.08.007</a></p>2025-11-14T12:00:00ZTextJournal contributioninfo:eu-repo/semantics/publishedVersiontextcontribution to journal10.1016/j.hrtlng.2025.08.007https://figshare.com/articles/journal_contribution/Comparing_DeepSeek_and_GPT-4o_in_ECG_interpretation_Is_AI_improving_over_time_/31167910CC BY 4.0info:eu-repo/semantics/openAccessoai:figshare.com:article/311679102025-11-14T12:00:00Z
spellingShingle	Comparing DeepSeek and GPT-4o in ECG interpretation: Is AI improving over time? Serkan GÜNAY (23072548) Biomedical and clinical sciences Cardiovascular medicine and haematology Health sciences Health services and systems Information and computing sciences Artificial intelligence ChatGPT GPT-4o Deep Seek Electrocardiography Emergency medicine
status_str	publishedVersion
title	Comparing DeepSeek and GPT-4o in ECG interpretation: Is AI improving over time?
title_full	Comparing DeepSeek and GPT-4o in ECG interpretation: Is AI improving over time?
title_fullStr	Comparing DeepSeek and GPT-4o in ECG interpretation: Is AI improving over time?
title_full_unstemmed	Comparing DeepSeek and GPT-4o in ECG interpretation: Is AI improving over time?
title_short	Comparing DeepSeek and GPT-4o in ECG interpretation: Is AI improving over time?
title_sort	Comparing DeepSeek and GPT-4o in ECG interpretation: Is AI improving over time?
topic	Biomedical and clinical sciences Cardiovascular medicine and haematology Health sciences Health services and systems Information and computing sciences Artificial intelligence ChatGPT GPT-4o Deep Seek Electrocardiography Emergency medicine

Comparing DeepSeek and GPT-4o in ECG interpretation: Is AI improving over time?

Similar Items