TECHNOLOGY AND CONTRAST MEDIA / ORIGINAL PAPER
Advancing AI in radiology: a comparative analysis of ChatGPT-o1-preview and ChatGPT-3.5 in the Polish National Specialization Exam
More details
Hide details
1
Students’ Scientific Association of Computer Analysis and Artificial Intelligence at the Department of Radiology and Nuclear Medicine of the Medical University of Silesia in Katowice, Poland
2
Department of Biophysics, Faculty of Medical Sciences in Zabrze, Medical University of Silesia in Katowice, Zabrze, Poland
3
Department of Radiodiagnostics, Interventional Radiology and Nuclear Medicine, Medical University of Silesia, Katowice, Poland
These authors had equal contribution to this work
Submission date: 2025-04-22
Final revision date: 2025-07-14
Acceptance date: 2025-07-17
Publication date: 2025-10-21
Corresponding author
Natalia Denisiewicz
Students' Scientific Association of Computer Analysis and Artificial Intelligence at the Department of Radiology and Nuclear Medicine of the Medical University of Silesia in Katowice
Pol J Radiol, 2025; 90: 519-525
KEYWORDS
TOPICS
ABSTRACT
Purpose:
The aim of this study was to evaluate the performance of the ChatGPT-o1-preview language model in solving the Polish National Specialization Exam (PES) in radiology and imaging diagnostics and compare its results with previous versions of the model.
Material and methods:
The aim of this study was to evaluate the performance of the ChatGPT-o1-preview language model in solving the Polish National Specialization Exam (PES) in radiology and imaging diagnostics and compare its results with previous versions of the model.A set of 119 valid radiology exam questions from Spring 2023 was analyzed. Each question was classified by type, subtype, and clinical relevance. ChatGPT answered each question five times using a standardized prompt with a 5-point confidence scale. Performance was assessed using accuracy and declared and calculated difficulty indices. Statistical analysis was performed in Python with a significance level of p < 0.05, and results were compared with a previous model version.
Results:
The model achieved a score of 93.33% correct answers, comparable to the average physician score of 94.86%. ChatGPT-o1-preview showed exceptional accuracy in “memory” questions, with over 96% correct answers. This result, significantly higher than that of the older ChatGPT-3.5 model (52%), demonstrates progress in artificial intelligence (AI) capabilities. The model also exhibited higher confidence in its responses, indicating better adaptation to medical exams.
Conclusions:
Despite its high accuracy, the study was based on a relatively small set of questions, which limits the ability to fully assess the model’s effectiveness. The results indicate the potential of AI as a tool to support clinical work, but further, more extensive research is necessary to evaluate its applicability and reliability in the medical environment.
REFERENCES (14)
1.
Wu JH, Lin S, Moghimi S. Application of artificial intelligence in glaucoma care: an updated review. Taiwan J Ophthalmol 2024; 14: 340-351.
2.
Zhao Z, Pi Y, Jiang L, Xiang Y, Wei J, Yang P, et al. Deep neural network based artificial intelligence assisted diagnosis of bone scintigraphy for cancer bone metastasis. Sci Rep 2020; 10: 17046. DOI: 10.1038/s41598-020-74135-4.
3.
Chen R, Wang Q, Javanmardi A. A review of the application of machine learning for pipeline integrity predictive analysis in water distribution networks. Arch Computat Methods Eng 2025. DOI: 10.1007/s11831-025-10251-6.
4.
Johnson D, Goodman R, Patrinely J, Stone C, Zimmerman E, Donald R, et al. Assessing the accuracy and reliability of AI-generated medical responses: an evaluation of the Chat-GPT model. Res Sq [Preprint] 2023: rs.3.rs-2566942. DOI: 10.21203/rs.3.rs-2566942/v1.
5.
Atkay S. Analysis of AI humanizer tools. In: Proceedings of the International Texas Congress on Advanced Scientific Research and Innovation. Houston, TX, 2024, pp. 40-48.
6.
Nazir A, Wang Z. A comprehensive survey of ChatGPT: advancements, applications, prospects, and challenges. Meta-Radiology 2023; 1: 100022. DOI:
https://doi.org/10.1016/j.metr....
7.
Kufel J, Paszkiewicz I, Bielówka M, Bartnikowska W, Janik M, Stencel M, et al. Will ChatGPT pass the Polish specialty exam in radiology and diagnostic imaging? Insights into strengths and limitations. Pol J Radiol 2023; 88: e430-e434. DOI: 10.5114/pjr.2023.131215.
8.
Cross JL, Choma MA, Onofrey JA. Bias in medical AI: implications for clinical decision-making. PLOS Digit Health 2024; 3: e0000651. DOI: 10.1371/journal.pdig.0000651.
9.
Naili YT, Mangkunegara IS, Purwono P, Baballe MA. Regulatory challenges in ai-based diagnostics: Legal implications of ai use in medical diagnostics. BIO Web Conf 2025; 152: 01034. DOI:
https://doi.org/10.1051/biocon....
10.
Anderson LW, Krathwohl DR (eds.). A Taxonomy for Learning, Teaching, and Assessing: a Revision of Bloom’s Taxonomy of Educational Objectives. New York: Longman; 2001.
12.
Rojek M, Kufel J, Bielówka M, Mitręga A, Kaczyńska D, Czogalik Ł, et al. Exploring the performance of ChatGPT-3.5 in addressing dermatological queries: a research investigation into AI capabilities. Przegl Dermatol 2024; 111: 26-30.
13.
Bielówka M, Kufel J, Rojek M, Mitręga A, Kaczyńska D, Czogalik Ł, et al. Evaluating ChatGPT-3.5 in allergology: performance in the Polish Specialist Examination. Alergologia Polska – Polish Journal of Allergology 2024; 11: 42-47.