Introduction: Data from social media have been shown to have utility in augmenting traditional approaches to public health surveillance. Quantifying the representativeness of these data is needed for making accurate public health inferences.
Methods: We applied machine-learning methods to explore spatial and temporal dengue event reporting trends on Twitter relative to confirmed cases, and quantified associations with sociodemographic factors across three Brazilian states (São Paulo, Rio de Janeiro, and Minas Gerais) at the municipality level.
Results: Education and income were positive predictors of dengue reporting on Twitter. In contrast, municipalities with a higher percentage of older adults, and males were less likely to report suspected dengue disease on Twitter. Overall, municipalities with dengue disease tweets had higher mean per capita income and lower proportion of individuals with no primary school education.
Conclusions: These observations highlight the need to understand population representation across locations, age, and racial/ethnic backgrounds in studies using social media data for public health research. Additional data is needed to assess and compare data representativeness across regions in Brazil.