Modeling Public Mood and Emotions: Social Media Sentiment Near Major Events

Social media analysis is an increasingly popular field, due to the vast amounts of real data. Here, we use sentiment in order to find patterns surrounding real-world events, specifically patterns involving the underlying behavior in populations. Using VADER, a lexicon based sentiment analysis tool, and OSOME, a platform for collecting data from Twitter, we analyze tweets surrounding major events, such as social unrest and emergency disasters.
We first aggregate the data into weeks and bootstrap the data with replacement in order to create a confidence interval of the median sentiment. We also create a null model by sampling tweets from the whole dataset matching the day of the week and then bootstrap the null model. We also specifically focus on early warning indicators of critical transitions, such as increased autocorrelation and variance to be used for prediction. An example of the bootstrapping analysis is shown in Figure 1.
In order to build an indicator, we run the above analysis with a changing lexicon. We run each word in the VADER lexicon independently in order to find its effect. We then choose subsets of the top ranked words in order to find which is the best indicator of the event. Our measure of best indicator is a function of the entropy, the magnitude of the largest peak/valley, and the distance from the largest peak/valley to the event day. The best indicator is then tested on various examples of the same event (social unrest, emergency disasters) for further validation.
We plan to extend this analysis by introducing new words into our lexicon using GLOVE and Word2Vec. We will also build a genetic algorithm in order to find the best subset of words. In the end, we plan on building a general system that can capture indicators for any type of event.

Krishna Bathina and Johan Bollen
Monday, September 24, 2018 - 17:30 to 17:45


