Objective: Artificial intelligence (AI), particularly large language models like ChatGPT, offers the potential to disseminate health information. This study aimed to assess the accuracy and completeness of ChatGPT-4’s responses to TB-related questions. Materials and Methods: Ninety English-language TB questions based on official guidelines and clinical experience were formulated. ChatGPT-4o provided answers to these questions between February 1 and March 1, 2024. Three evaluation subgroups assessed the responses for accuracy (using a sixpoint Likert scale) and completeness (using a three-point Likert scale). Statistical analyses were performed using non-parametric tests. Results: The median accuracy score was 5 out of 6, with 88.9% of responses scoring at least 5, indicating high overall accuracy. However, only 34.4% achieved the highest score of 6, with diminished performance on medium and high level of expertise (LOE) questions. Low LOE questions had the highest accuracy, with 63.3% scoring 6. Completeness scores showed that 48.9% of responses were comprehensive (score of 3), particularly for low LOE questions (70% scored 3). In contrast, only 23.3% of high LOE questions achieved the highest completeness score. ChatGPT-4 often lacked specificity in complex topics, such as drug-resistant TB therapies, and provided outdated information not aligned with current World Health Organization guidelines. Conclusions: ChatGPT-4 effectively delivers accurate and comprehensive information for general TB inquiries, making it a valuable resource for the public and non-specialist clinicians. However, its performance declines with increasing question complexity, limiting its utility for advanced clinical decision-making in TB care. Continuous updates and enhancements are necessary to improve its accuracy and relevance in specialised medical contexts.
Is ChatGPT-4 accurate and complete when answering questions on tuberculosis? Results of the ChatGTB study / De Vito, A.; Colpani, A.; Buonsenso, D.; Candoli, P. M. M.; Falbo, E.; La Fauci, S.; Madeddu, G.; Masini, T.; Misiano, G.; Monari, C.; Pontarelli, A.; Riccardi, N.; Saderi, L.; Saluzzo, F.; Sotgiu, G.; Tadolini, M.; Besozzi, G.; Calcagno, A.. - In: INFECTIOUS DISEASES AND TROPICAL MEDICINE. - ISSN 2379-4054. - 11:(2025). [10.32113/idtm_202510_1766]
Is ChatGPT-4 accurate and complete when answering questions on tuberculosis? Results of the ChatGTB study
De Vito A.;Colpani A.;Madeddu G.;Saderi L.;Sotgiu G.;
2025-01-01
Abstract
Objective: Artificial intelligence (AI), particularly large language models like ChatGPT, offers the potential to disseminate health information. This study aimed to assess the accuracy and completeness of ChatGPT-4’s responses to TB-related questions. Materials and Methods: Ninety English-language TB questions based on official guidelines and clinical experience were formulated. ChatGPT-4o provided answers to these questions between February 1 and March 1, 2024. Three evaluation subgroups assessed the responses for accuracy (using a sixpoint Likert scale) and completeness (using a three-point Likert scale). Statistical analyses were performed using non-parametric tests. Results: The median accuracy score was 5 out of 6, with 88.9% of responses scoring at least 5, indicating high overall accuracy. However, only 34.4% achieved the highest score of 6, with diminished performance on medium and high level of expertise (LOE) questions. Low LOE questions had the highest accuracy, with 63.3% scoring 6. Completeness scores showed that 48.9% of responses were comprehensive (score of 3), particularly for low LOE questions (70% scored 3). In contrast, only 23.3% of high LOE questions achieved the highest completeness score. ChatGPT-4 often lacked specificity in complex topics, such as drug-resistant TB therapies, and provided outdated information not aligned with current World Health Organization guidelines. Conclusions: ChatGPT-4 effectively delivers accurate and comprehensive information for general TB inquiries, making it a valuable resource for the public and non-specialist clinicians. However, its performance declines with increasing question complexity, limiting its utility for advanced clinical decision-making in TB care. Continuous updates and enhancements are necessary to improve its accuracy and relevance in specialised medical contexts.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


