PlumX Metrics
Embed PlumX Metrics

Evaluating ChatGPT’s competency in radiation oncology: A comprehensive assessment across clinical scenarios

Radiotherapy and Oncology, ISSN: 0167-8140, Vol: 202, Page: 110645
2025
  • 0
    Citations
  • 0
    Usage
  • 20
    Captures
  • 0
    Mentions
  • 0
    Social Media
Metric Options:   Counts1 Year3 Year

Metrics Details

Article Description

Artificial intelligence (AI) and machine learning present an opportunity to enhance clinical decision-making in radiation oncology. This study aims to evaluate the competency of ChatGPT, an AI language model, in interpreting clinical scenarios and assessing its oncology knowledge. A series of clinical cases were designed covering 12 disease sites. Questions were grouped into domains: epidemiology, staging and workup, clinical management, treatment planning, cancer biology, physics, and surveillance. Royal College-certified radiation oncologists (ROs) reviewed cases and provided solutions. ROs scored responses on 3 criteria: conciseness (focused answers), completeness (addressing all aspects of the question), and correctness (answer aligns with expert opinion) using a standardized rubric. Scores ranged from 0 to 5 for each criterion for a total possible score of 15. Across 12 cases, 182 questions were answered with a total AI score of 2317/2730 (84 %). Scores by criteria were: completeness (79 %, range: 70–99 %), conciseness (92 %, range: 83–99 %), and correctness (81 %, range: 72–92 %). AI performed best in the domains of epidemiology (93 %) and cancer biology (93 %) and reasonably in staging and workup (89 %), physics (86 %) and surveillance (82 %). Weaker domains included treatment planning (78 %) and clinical management (81 %). Statistical differences were driven by variations in the completeness (p < 0.01) and correctness (p = 0.04) criteria, whereas conciseness scored universally high (p = 0.91). These trends were consistent across disease sites. ChatGPT showed potential as a tool in radiation oncology, demonstrating a high degree of accuracy in several oncologic domains. However, this study highlights limitations with incorrect and incomplete answers in complex cases.

Bibliographic Details

Ramadan, Sherif; Mutsaers, Adam; Chen, Po-Hsuan Cameron; Bauman, Glenn; Velker, Vikram; Ahmad, Belal; Arifin, Andrew J; Nguyen, Timothy K; Palma, David; Goodman, Christopher D

Elsevier BV

Medicine

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know