Assessing the Accuracy, Completeness and Safety of ChatGPT-4o Responses on Pressure Injuries in Infants: Clinical Applications and Future Implications

Soddu, Marica; De Vito, Andrea; Madeddu, Giordano; Nicolosi, Biagio; Provenzano, Maria; Ivziku, Dhurata; Curcio, Felice

doi:10.3390/nursrep15040130

Background/Objectives: The advent of large language models (LLMs), like platforms such as ChatGPT, capable of generating quick and interactive answers to complex questions, opens the way for new approaches to training healthcare professionals, enabling them to acquire up-to-date and specialised information easily. In nursing, they have proven to support clinical decision making, continuing education, the development of care plans and the management of complex clinical cases, as well as the writing of academic reports and scientific articles. Furthermore, the ability to provide rapid access to up-to-date scientific information can improve the quality of care and promote evidence-based practice. However, their applicability in clinical practice requires thorough evaluation. This study evaluated the accuracy, completeness and safety of the responses generated by ChatGPT-4 on pressure injuries (PIs) in infants. Methods: In January 2025, we analysed the responses generated by ChatGPT-4 to 60 queries, subdivided into 12 main topics, on PIs in infants. The questions were developed, through consultation of authoritative documents, based on their relevance to nursing care and clinical potential. A panel of five experts, using a 5-point Likert scale, assessed the accuracy, completeness and safety of the answers generated by ChatGPT. Results: Overall, over 90% of the responses generated by ChatGPT-4o received relatively high ratings for the three criteria assessed with the most frequent value of 4. However, when analysing the 12 topics individually, we observed that Medical Device Management and Technological Innovation were the topics with the lowest accuracy scores. At the same time, Scientific Evidence and Technological Innovation had the lowest completeness scores. No answers for the three criteria analysed were rated as completely incorrect. Conclusions: ChatGPT-4 has shown a good level of accuracy, completeness and safety in addressing questions about pressure injuries in infants. However, ongoing updates and integration of high-quality scientific sources are essential for ensuring its reliability as a clinical decision-support tool.

Assessing the Accuracy, Completeness and Safety of ChatGPT-4o Responses on Pressure Injuries in Infants: Clinical Applications and Future Implications / Soddu, M., De Vito, A., Madeddu, G., Nicolosi, B., Provenzano, M., Ivziku, D., Curcio, F.. - In: NURSING REPORTS. - ISSN 2039-4403. - 15:4(2025). [10.3390/nursrep15040130]