Websites like Reddit offer places for users to enjoy funny memes, complain about coworkers, or learn new facts. They can also easily give rise to “rabbit holes” while offering unique and surprising experiences, and while these websites may seem like just another form of social media, the data generated from Reddit and others is actually being analyzed by scientists. In one new study, researchers from Dartmouth College use an AI model to analyze data from Reddit in order to predict mental disorders in many users.
Background: Studying Internet Emotions Using Reddit
In order to predict possible mental disorders in many Reddit users, the researchers developed an AI model to analyze the emotions and emotional language of posts and comments. Because the researchers were hoping to predict the type of mental disorders known as emotional disorders (including major bipolar, anxiety, and depressive disorder), they needed to see distinct emotional patterns created by a user’s data. As Reddit has over half a billion users, the researchers were about to collect a myriad amount of data in order to compile a full picture.
The AI model was trained to label emotions from a user’s post, and map out the transitions from emotion to emotion between posts. Posts could range in labels, from no emotion to sadness, joy, fear, anger, and more based on a user’s language. The emotional map helped the researchers to determine the causes of these emotional shifts as well as their frequency. This also allowed the researchers to create a unique emotional signature corresponding to each user, which could then be compared to patterns of emotional disorders for diagnosis.
Analysis: Avoiding Data Leaks in the AI Model
To test their new model, the researchers ran data on individuals who had already been diagnosed with emotional mental disorders. The model was able to corroborate previous diagnoses quite accurately, showing that the algorithms were successful.
In building their AI model with personalized signatures, the scientists were able to avoid a common problem with AI algorithms called data leakage. Data leakage happens when information outside the training data (used to build the model) is used in creating the model. For example, the AI model may link posts about COVID-19 with emotions of sadness or anxiety, predicting that a user has anxiety or depression based on their posts. The new model avoids this association entirely by only looking for the emotions and not outside topics.
Outlook: Social Media as a Behavioral Thermometer
Social media websites, like Reddit, have a lot of data for researchers to study. “Social media offers an easy way to tap into people’s behaviors,” explained co-author Xiaobo Guo, who also mentioned that all data is public and voluntary. The researchers are hopeful that clinicians can use their model as a behavioral thermometer, monitoring symptoms and giving a diagnosis with treatment. Though they did not offer treatment for their subjects, the researchers believe their model can be modified for this process. According to assistant professor at Dartmouth College, Soroush Vosougi: “It’s very important to have models that perform well, but also really understand their working biases and limitations.”
Kenna Castleberry is a staff writer at the Debrief and the Science Communicator at JILA (a partnership between the University of Colorado Boulder and NIST). She focuses on deep tech, the metaverse, and quantum technology. You can find more of her work at her website: https://kennacastleberry.com/