Social Media Posts May Predict Whether You Get Diabetes; Other Medical Conditions

Published on
By : Suvarna Sheth

Your Facebook status updates may reveal a whole lot more about you than you think.

It turns out, the language you use in your posts may predict whether someone will develop diabetes and other medical conditions like depression, anxiety, alcohol, and drug abuse — more so than demographic statistics — this according to researchers from Stony Brook University and Penn Medicine.

Most notably, the researchers found the medical condition categories for which Facebook statuses show the largest prediction accuracy gains over demographics include diabetes, pregnancy, anxiety, psychoses, and depression.

The authors note that some early research has linked social media language use with health, but “this is the first study to the best of our understanding to do so at the level of the patient with EMR data,” they write.

How was the Study Conducted?

Participants for the study were drawn from the ongoing Social Mediome study, in which adult patients share their past social media activity and EMR data.

Of all participants enrolled through October 2015 and agreeing to share their Facebook data, the researchers retrieved old status updates from up to 5 years prior.

They then limited their analyses to those with at least 500 words across all of their Facebook status updates for a reliable threshold for language analysis.

All data collected from Facebook was from consenting patients and not from anyone in their network, and consistent with the terms of service of the platform.

From the health system’s EMRs, researchers retrieved demographics (age, sex, and race) and prior diagnoses. Researchers then came up with 21 medical condition categories.

Of 1143 patients who shared both social media and EMR data, 999 (87%) had an adequate number of status updates (at least 20).

From the 999 adults, 76% of study participants were women, 71% were black, and 70% were 30 years old or younger.

What were the Findings?

The researchers identified that: all 21 medical condition categories were predictable from Facebook language beyond chance and 18 categories were better predicted from a combination of demographics and Facebook language than by demographics alone.

They found that 10 categories were better predicted by Facebook language than by the standard demographic factors (age, sex, and race).

The medical condition categories for which Facebook statuses show the largest prediction accuracy gains over demographics include diabetes, pregnancy, and mental health.

The authors point out several limitations to their study, notably, that “predictive words often do not represent causal mechanisms and the findings are correlational.”

Also, they note predictive associations of language with certain diseases may vary across populations, requiring rederivation of language markers in different populations.

Importantly, the participants in the current study represented a sample of primarily African American women who were receiving care at an urban academic medical center and the authors acknowledge this is not representative of the general population.

“People’s personality, mental state, and health behaviors are all reflected in their social media and all have a tremendous impact on health,” the researchers state in their conclusion. “This is the first study to show that language on Facebook can predict diagnoses within people’s health record, revealing new opportunities to personalize care and understand how patients’ ordinary daily lives relate to their health.”

Researchers say future studies could compare the health-related information disclosed by users of different demographic populations and on other social media platforms.

The research has been published in PLOS ONE.

The study was funded by the Robert Wood Johnson Foundation Pioneer Award.


  1. PLOS. (2019, June 17). Evaluating the predictability of medical conditions from social media posts. Retrieved June 20, 2019, from