Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods

Citation DataProceedings of the National Academy of Sciences of the United States of America, ISSN: 1091-6490, Vol: 117, Issue: 19, Page: 10165-10171

Publication Year2020

154
Citations
0
Usage
254
Captures
6
Mentions
24
Social Media

Metric Options: Counts1 Year3 Year

Metrics Details

Citations
154
- Citation Indexes
  144
- Policy Citations
  10
Captures
254
- Readers
  254
Mentions
6
- News Mentions
  4
- Blog Mentions
  1
- References
  1
Social Media
24
- Shares, Likes & Comments
  24

Most Recent News

Twitter can reveal the well-being of a whole community

May 4, 2020
Futurity

Social media can reveal the psychological states of an entire population, according to new research. The results show that through machine-learning—teaching a computer to identify and analyze patterns in large datasets—researchers can see, in principle, how a society is doing in real-time. “These methods really show how to do psychological measurement in the 21st century in our digital world,” say

Article Description

Researchers and policy makers worldwide are interested in measuring the subjective well-being of populations. When users post on social media, they leave behind digital traces that reflect their thoughts and feelings. Aggregation of such digital traces may make it possible to monitor well-being at large scale. However, social media-based methods need to be robust to regional effects if they are to produce reliable estimates. Using a sample of 1.53 billion geotagged English tweets, we provide a systematic evaluation of word-level and data-driven methods for text analysis for generating well-being estimates for 1,208 US counties. We compared Twitter-based county-level estimates with well-being measurements provided by the Gallup-Sharecare Well-Being Index survey through 1.73 million phone surveys. We find that word-level methods (e.g., Linguistic Inquiry and Word Count [LIWC] 2015 and Language Assessment by Mechanical Turk [LabMT]) yielded inconsistent county-level well-being measurements due to regional, cultural, and socioeconomic differences in language use. However, removing as few as three of the most frequent words led to notable improvements in well-being prediction. Data-driven methods provided robust estimates, approximating the Gallup data at up to r = 0.64. We show that the findings generalized to county socioeconomic and health outcomes and were robust when poststratifying the samples to be more representative of the general US population. Regional well-being estimation from social media data seems to be robust when supervised data-driven methods are used.

Bibliographic Details

DOI10.1073/pnas.1906364117

PMID32341156

URL IDhttp://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=85084502628&origin=inward; http://dx.doi.org/10.1073/pnas.1906364117; http://www.ncbi.nlm.nih.gov/pubmed/32341156; https://pnas.org/doi/full/10.1073/pnas.1906364117; https://dx.doi.org/10.1073/pnas.1906364117; https://www.pnas.org/content/117/19/10165

AUTHOR(S)

Jaidka, Kokil; Giorgi, Salvatore; Schwartz, H Andrew; Kern, Margaret L; Ungar, Lyle H; Eichstaedt, Johannes C

PUBLISHER(S)

Proceedings of the National Academy of Sciences

TAG(S)

Multidisciplinary

Provide Feedback

Have ideas for a new metric? Would you like to see something else here?Let us know