TECHNOLOGY AND CONTRAST MEDIA / ORIGINAL PAPER
Figure from article: Advancing AI in radiology:...
 
KEYWORDS
TOPICS
ABSTRACT
Purpose:
The aim of this study was to evaluate the performance of the ChatGPT-o1-preview language model in solving the Polish National Specialization Exam (PES) in radiology and imaging diagnostics and compare its results with previous versions of the model.

Material and methods:
The aim of this study was to evaluate the performance of the ChatGPT-o1-preview language model in solving the Polish National Specialization Exam (PES) in radiology and imaging diagnostics and compare its results with previous versions of the model.A set of 119 valid radiology exam questions from Spring 2023 was analyzed. Each question was classified by type, subtype, and clinical relevance. ChatGPT answered each question five times using a standardized prompt with a 5-point confidence scale. Performance was assessed using accuracy and declared and calculated difficulty indices. Statistical analysis was performed in Python with a significance level of p < 0.05, and results were compared with a previous model version.

Results:
The model achieved a score of 93.33% correct answers, comparable to the average physician score of 94.86%. ChatGPT-o1-preview showed exceptional accuracy in “memory” questions, with over 96% correct answers. This result, significantly higher than that of the older ChatGPT-3.5 model (52%), demonstrates progress in artificial intelligence (AI) capabilities. The model also exhibited higher confidence in its responses, indicating better adaptation to medical exams.

Conclusions:
Despite its high accuracy, the study was based on a relatively small set of questions, which limits the ability to fully assess the model’s effectiveness. The results indicate the potential of AI as a tool to support clinical work, but further, more extensive research is necessary to evaluate its applicability and reliability in the medical environment.
REFERENCES (14)
1.
Wu JH, Lin S, Moghimi S. Application of artificial intelligence in glaucoma care: an updated review. Taiwan J Ophthalmol 2024; 14: 340-351.
 
2.
Zhao Z, Pi Y, Jiang L, Xiang Y, Wei J, Yang P, et al. Deep neural network based artificial intelligence assisted diagnosis of bone scintigraphy for cancer bone metastasis. Sci Rep 2020; 10: 17046. DOI: 10.1038/s41598-020-74135-4.
 
3.
Chen R, Wang Q, Javanmardi A. A review of the application of machine learning for pipeline integrity predictive analysis in water distribution networks. Arch Computat Methods Eng 2025. DOI: 10.1007/s11831-025-10251-6.
 
4.
Johnson D, Goodman R, Patrinely J, Stone C, Zimmerman E, Do­nald R, et al. Assessing the accuracy and reliability of AI-generated medical responses: an evaluation of the Chat-GPT model. Res Sq [Preprint] 2023: rs.3.rs-2566942. DOI: 10.21203/rs.3.rs-2566942/v1.
 
5.
Atkay S. Analysis of AI humanizer tools. In: Proceedings of the International Texas Congress on Advanced Scientific Research and Innovation. Houston, TX, 2024, pp. 40-48.
 
6.
Nazir A, Wang Z. A comprehensive survey of ChatGPT: advancements, applications, prospects, and challenges. Meta-Radiology 2023; 1: 100022. DOI: https://doi.org/10.1016/j.metr....
 
7.
Kufel J, Paszkiewicz I, Bielówka M, Bartnikowska W, Janik M, Stencel M, et al. Will ChatGPT pass the Polish specialty exam in radiology and dia­gnostic imaging? Insights into strengths and limitations. Pol J Radiol 2023; 88: e430-e434. DOI: 10.5114/pjr.2023.131215.
 
8.
Cross JL, Choma MA, Onofrey JA. Bias in medical AI: implications for clinical decision-making. PLOS Digit Health 2024; 3: e0000651. DOI: 10.1371/journal.pdig.0000651.
 
9.
Naili YT, Mangkunegara IS, Purwono P, Baballe MA. Regulatory challenges in ai-based diagnostics: Legal implications of ai use in medical diagnostics. BIO Web Conf 2025; 152: 01034. DOI: https://doi.org/10.1051/biocon....
 
10.
Anderson LW, Krathwohl DR (eds.). A Taxonomy for Learning, Teaching, and Assessing: a Revision of Bloom’s Taxonomy of Educational Objectives. New York: Longman; 2001.
 
11.
Centrum Egzaminów Medycznych [Internet]. [cited 2025 Apr 1]. Available from: https://www.cem.edu.pl/aktualn....
 
12.
Rojek M, Kufel J, Bielówka M, Mitręga A, Kaczyńska D, Czogalik Ł, et al. Exploring the performance of ChatGPT-3.5 in addressing dermatological queries: a research investigation into AI capabilities. Przegl Dermatol 2024; 111: 26-30.
 
13.
Bielówka M, Kufel J, Rojek M, Mitręga A, Kaczyńska D, Czogalik Ł, et al. Evaluating ChatGPT-3.5 in allergology: performance in the Polish Specialist Examination. Alergologia Polska – Polish Journal of Allergology 2024; 11: 42-47.
 
14.
Introducing OpenAI o1 [Internet]. [cited 2025 Apr 1]. Available from: https://openai.com/index/intro....
 
Journals System - logo
Scroll to top