Vijeta Deshpande

2024

pdf bib abs
LocalTweets to LocalHealth: A Mental Health Surveillance Framework Based on Twitter Data
Vijeta Deshpande | Minhwa Lee | Zonghai Yao | Zihao Zhang | Jason Brian Gibbons | Hong Yu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Prior research on Twitter (now X) data has provided positive evidence of its utility in developing supplementary health surveillance systems. In this study, we present a new framework to surveil public health, focusing on mental health (MH) outcomes. We hypothesize that locally posted tweets are indicative of local MH outcomes and collect tweets posted from 765 neighborhoods (census block groups) in the USA. We pair these tweets from each neighborhood with the corresponding MH outcome reported by the Center for Disease Control (CDC) to create a benchmark dataset, LocalTweets. With LocalTweets, we present the first population-level evaluation task for Twitter-based MH surveillance systems. We then develop an efficient and effective method, LocalHealth, for predicting MH outcomes based on LocalTweets. When used with GPT3.5, LocalHealth achieves the highest F1-score and accuracy of 0.7429 and 79.78%, respectively, a 59% improvement in F1-score over the GPT3.5 in zero-shot setting. We also utilize LocalHealth to extrapolate CDC’s estimates to proxy unreported neighborhoods, achieving an F1-score of 0.7291. Our work suggests that Twitter data can be effectively leveraged to simulate neighborhood-level MH outcomes.

2023

pdf bib abs
Honey, I Shrunk the Language: Language Model Behavior at Reduced Scale.
Vijeta Deshpande | Dan Pechi | Shree Thatte | Vladislav Lialin | Anna Rumshisky
Findings of the Association for Computational Linguistics: ACL 2023

In recent years, language models have drastically grown in size, and the abilities of these models have been shown to improve with scale. The majority of recent scaling laws studies focused on high-compute high-parameter count settings, leaving the question of when these abilities begin to emerge largely unanswered. In this paper, we investigate whether the effects of pre-training can be observed when the problem size is reduced, modeling a smaller, reduced-vocabulary language. We show the benefits of pre-training with masked language modeling (MLM) objective in models as small as 1.25M parameters, and establish a strong correlation between pre-training perplexity and downstream performance (GLUE benchmark). We examine downscaling effects, extending scaling laws to models as small as ~1M parameters. At this scale, we observe a break of the power law for compute-optimal models and show that the MLM loss does not scale smoothly with compute-cost (FLOPs) below 2.2 × 10¹⁵ FLOPs. We also find that adding layers does not always benefit downstream performance.Our filtered pre-training data, reduced English vocabulary, and code are available at https://github.com/text-machine-lab/mini_bertgithub.com/text-machine-lab/mini_bert

Co-authors