Alzheimer's Disease, the most common form of dementia, is a major global health challenge, where early detection is crucial for improving outcomes. This study presents a diagnostic framework using advanced Vision Language Models to combine linguistic and visual data. The system emphasizes explainability by producing detailed, human-readable assessments based on linguistic criteria. These are converted into numeric scores by a trained language model, enabling interpretable and efficient downstream classification. This work establishes a scalable and interpretable pipeline, demonstrating the feasibility of deploying resource-efficient diagnostic tools on consumer-grade GPUs. Currently focused on the Cookie Theft picture, the framework paves the way for future research with diverse stimuli, larger datasets, and culturally adaptable multimodal Artificial Intelligence systems for detecting neurodegenerative diseases.
Leveraging Multimodal Vision Language Models for Early Detection of Alzheimer's Disease / Casu, F.; Grosso, E.; Lagorio, A.; Ruiu, P.; Trunfio, G. A.. - (2025), pp. 291-298. (Intervento presentato al convegno 33rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2025 tenutosi a ita nel 2025) [10.1109/PDP66500.2025.00047].
Leveraging Multimodal Vision Language Models for Early Detection of Alzheimer's Disease
Casu F.;Grosso E.;Lagorio A.;Ruiu P.;Trunfio G. A.
2025-01-01
Abstract
Alzheimer's Disease, the most common form of dementia, is a major global health challenge, where early detection is crucial for improving outcomes. This study presents a diagnostic framework using advanced Vision Language Models to combine linguistic and visual data. The system emphasizes explainability by producing detailed, human-readable assessments based on linguistic criteria. These are converted into numeric scores by a trained language model, enabling interpretable and efficient downstream classification. This work establishes a scalable and interpretable pipeline, demonstrating the feasibility of deploying resource-efficient diagnostic tools on consumer-grade GPUs. Currently focused on the Cookie Theft picture, the framework paves the way for future research with diverse stimuli, larger datasets, and culturally adaptable multimodal Artificial Intelligence systems for detecting neurodegenerative diseases.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.