Dementia represents a global public health concern, with the early detection of Alzheimer's disease, the most prevalent form of dementia, being of paramount importance. Given the limited availability of suitable biomarkers, research has shown that early cognitive impairment can be identified through patients' spoken language. This paper presents a multi-modal system for automatic Alzheimer's disease detection using speech. The system has been trained on spoken recordings of healthy individuals and Alzheimer's patients describing an image, a task requiring linguistic and cognitive skills. Built on fine-tuned advanced Large Language Models, audio feature extractors, and classifiers, the system, after an extensive comparison of single and multi-modal architectures, achieves optimal results with the combination of Mistral-7B, VGGish, and Support Vector Classifier, outperforming previous methods on the ADReSSo 2021 test set.
Integrating Fine-Tuned LLM with Acoustic Features for Enhanced Detection of Alzheimer's Disease / Casu, Filippo; Lagorio, Andrea; Ruiu, Pietro; Trunfio, Giuseppe A; Grosso, Enrico. - In: IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS. - ISSN 2168-2194. - PP:(2025). [10.1109/JBHI.2025.3566615]
Integrating Fine-Tuned LLM with Acoustic Features for Enhanced Detection of Alzheimer's Disease
Casu, Filippo;Lagorio, Andrea;Ruiu, Pietro;Trunfio, Giuseppe A;Grosso, Enrico
2025-01-01
Abstract
Dementia represents a global public health concern, with the early detection of Alzheimer's disease, the most prevalent form of dementia, being of paramount importance. Given the limited availability of suitable biomarkers, research has shown that early cognitive impairment can be identified through patients' spoken language. This paper presents a multi-modal system for automatic Alzheimer's disease detection using speech. The system has been trained on spoken recordings of healthy individuals and Alzheimer's patients describing an image, a task requiring linguistic and cognitive skills. Built on fine-tuned advanced Large Language Models, audio feature extractors, and classifiers, the system, after an extensive comparison of single and multi-modal architectures, achieves optimal results with the combination of Mistral-7B, VGGish, and Support Vector Classifier, outperforming previous methods on the ADReSSo 2021 test set.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


