The integration of Large Language Models (LLMs) into scientific writing presents significant opportunities for scholars but also risks, including misinformation and plagiarism. A new body of literature is shaping to verify the capability of LLMs to execute the complex tasks that are inherent to academic publishing. In this context this study was driven by the need to critically assess LLM’s out-of-the-box performance in generating evidence synthesis reviews. To this end, the signature topic of the authors’ group, cross-education of voluntary force, was chosen as a model. We prompted a popular LLM (Gemini 2.5 Pro, Deep Research enabled) to generate a scoping review on the neural mechanisms underpinning cross-education. The resulting unedited manuscript was submitted for formal peer-review to four leading subject-matter experts. Their qualitative feedback on manuscript’s structure, content, and integrity was collated and analyzed. Peer-reviewers identified critical failures at fundamental stages of the review process. The LLM failed to: (1) identify specific research questions; (2) adhere to established methodological frameworks; (3) implement trustworthy search strategies; (4) objectively synthesize data. Importantly, the Results section was deemed interpretative rather than descriptive. Referencing was agreed as the worst issue being inaccurate, biased toward open-access sources (84%), and containing instances of plagiarism. The LLM also failed to hierarchize evidence, presenting minor or underexplored findings as established evidence. The LLM generated a non-systematic, poorly structured, and unreliable narrative review. These findings suggest that the selected LLM is incapable of autonomously performing scientific synthesis and requires massive human supervision to correct the observed issues.

Peer-reviewed by human experts: AI failed in key steps to generate a scoping review on the neural mechanisms of cross-education / Morrone, M.; Hortobágyi, T.; Kidgell, D.; Farthing, J. P.; Deriu, F.; Manca, A.. - In: EUROPEAN JOURNAL OF APPLIED PHYSIOLOGY. - ISSN 1439-6319. - Dec 2025(2025). [10.1007/s00421-025-06100-w]

Peer-reviewed by human experts: AI failed in key steps to generate a scoping review on the neural mechanisms of cross-education

Morrone, M.;Deriu, F.
;
Manca, A.
2025-01-01

Abstract

The integration of Large Language Models (LLMs) into scientific writing presents significant opportunities for scholars but also risks, including misinformation and plagiarism. A new body of literature is shaping to verify the capability of LLMs to execute the complex tasks that are inherent to academic publishing. In this context this study was driven by the need to critically assess LLM’s out-of-the-box performance in generating evidence synthesis reviews. To this end, the signature topic of the authors’ group, cross-education of voluntary force, was chosen as a model. We prompted a popular LLM (Gemini 2.5 Pro, Deep Research enabled) to generate a scoping review on the neural mechanisms underpinning cross-education. The resulting unedited manuscript was submitted for formal peer-review to four leading subject-matter experts. Their qualitative feedback on manuscript’s structure, content, and integrity was collated and analyzed. Peer-reviewers identified critical failures at fundamental stages of the review process. The LLM failed to: (1) identify specific research questions; (2) adhere to established methodological frameworks; (3) implement trustworthy search strategies; (4) objectively synthesize data. Importantly, the Results section was deemed interpretative rather than descriptive. Referencing was agreed as the worst issue being inaccurate, biased toward open-access sources (84%), and containing instances of plagiarism. The LLM also failed to hierarchize evidence, presenting minor or underexplored findings as established evidence. The LLM generated a non-systematic, poorly structured, and unreliable narrative review. These findings suggest that the selected LLM is incapable of autonomously performing scientific synthesis and requires massive human supervision to correct the observed issues.
2025
Peer-reviewed by human experts: AI failed in key steps to generate a scoping review on the neural mechanisms of cross-education / Morrone, M.; Hortobágyi, T.; Kidgell, D.; Farthing, J. P.; Deriu, F.; Manca, A.. - In: EUROPEAN JOURNAL OF APPLIED PHYSIOLOGY. - ISSN 1439-6319. - Dec 2025(2025). [10.1007/s00421-025-06100-w]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11388/375650
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact