SLaCAD: A Spoken Language Corpus for Early Alzheimer’s Disease Detection

Shahla Farzana, Edoardo Stoppa, Alex Leow, Tamar Gollan, Raeanne Moore, David Salmon, Douglas Galasko, Erin Sundermann, Natalie Parde


Abstract
Identifying early markers of Alzheimer’s disease (AD) trajectory enables intervention in early disease stages when our currently-available interventions are most likely to be beneficial. Research has shown that alterations in speech, as well as linguistic and semantic deviations in spontaneous conversation detected using natural language processing, manifest early in AD prior to some other observed cognitive deficits. Recent studies show that cerebrospinal fluid (CSF) levels serve as useful early biomarkers for identifying early AD, but CSF biomarkers are challenging to collect. A simpler alternative that has seen very rapid development is based on the use of plasma biomarkers as a blood draw is minimally invasive. Associating verbal and nonverbal characteristics from speech data with CSF and plasma biomarkers may open the door to less invasive, more efficient methods for early AD detection. We present SLaCAD, a new dataset to facilitate this process. We describe our data collection procedures, analyze the resulting corpus, and present preliminary findings that relate measures extracted from the audio and transcribed text to clinical diagnoses, CSF levels, and plasma biomarkers. Our findings demonstrate the feasibility of this and indicate that the collected data can be used to improve assessments of early AD.
Anthology ID:
2024.lrec-main.1296
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
14877–14897
Language:
URL:
https://aclanthology.org/2024.lrec-main.1296
DOI:
Bibkey:
Cite (ACL):
Shahla Farzana, Edoardo Stoppa, Alex Leow, Tamar Gollan, Raeanne Moore, David Salmon, Douglas Galasko, Erin Sundermann, and Natalie Parde. 2024. SLaCAD: A Spoken Language Corpus for Early Alzheimer’s Disease Detection. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 14877–14897, Torino, Italia. ELRA and ICCL.
Cite (Informal):
SLaCAD: A Spoken Language Corpus for Early Alzheimer’s Disease Detection (Farzana et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.1296.pdf