On #agony and #ecstasy: Potential and pitfalls of linguistic sentiment analysis (joint work with Karim Kassam)
With the ready availability of social media data, researchers are increasingly undertaking large-scale studies of online emotion. Many of these studies employ sentiment analysis—automatically inferring emotional information from text written on blogs, status updates, or tweets. We compare the momentary, daily, and day-of-week patterns of affect data extracted from Twitter to affect data generated by directly polling a demographically representative sample. We highlight striking inconsistencies, casting doubt on the direct application ofsentiment analysis tools to measure population-level well-being. Whereas sentiment analysis tools appear to capture negative affect reasonably accurately, the same tools produce estimates of positive affect that are uncorrelated with direct measurement, because the frequency of positive words is not a reliable indicator of positive affect. As a proof of concept, we present kernelized distribution regression on Word2Vec features to enable accurate inference about population-level emotion, validating this "ecological inference" approach on geographical summaries of well-being.
Seth Flaxman (www.sethrf.com <http://www.sethrf.com>) is a postdoc with Yee Whye Teh at Oxford in the computational statistics and machine learning group in the Department of Statistics. His research is on scalable methods for spatiotemporal statistics and Bayesian machine learning, applied to public policy / social science areas including crime, emotion, and public health. Seth completed his BA in computer science and mathematics at Harvard in 2008 and his PhD in machine learning and public policy at Carnegie Mellon University in 2015, advised by Daniel Neill and Alex Smola.
Add to your calendar