• Stars
    star
    210
  • Rank 187,585 (Top 4 %)
  • Language
    R
  • License
    GNU General Publi...
  • Created almost 11 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Estimating Ideological Positions with Twitter Data

Estimating Ideological Positions with Twitter Data

This GitHub repository contains code and materials related to the article "Birds of a Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter Data," published in Political Analysis in 2015.

The original replication code can be found in the replication folder. See also Dataverse for the full replication materials, including data and output.

For updated versions of the code, that implement a more computationally efficient approach introduced in Barberá et al (2015, Psychological Science), see the folders 2016-update/, 2018-update/, and 2020-update/.

As an application of the method, in June 2015 I wrote a blog post on The Monkey Cage / Washington Post entitled "Who is the most conservative Republican candidate for president?." The replication code for the figure in the post is available in the primary folder.

Finally, this repository also contains an R package (tweetscores) with several functions to facilitate the application of this method in future research. The rest of this README file provides a tutorial with instructions showing how to use it.

NOTE: the package currently only support v1.1 of Twitter's API, which at some point in the near future will be deprecated. This package is not currently maintained, so use at your own risk.

Authentication

In order to download data from Twitter’s API, you will first need to create a developer account in developer.twitter.com. After receiving your approval, it’s necessary to follow these steps:

  1. Go to Twitter's Developer Portal and sign in
  2. Click on "Overview", then “Create New App”
  3. Follow the instructions.
  4. Copy the API Key and Secret, as well as the Access Token and Secret, and paste them below:
my_oauth <- list(consumer_key = "CONSUMER_KEY",
    consumer_secret = "CONSUMER_SECRET",
    access_token="ACCESS_TOKEN",
    access_token_secret = "ACCESS_TOKEN_SECRET")
  1. Change current folder into a folder where you will save all your tokens
setwd("~/Dropbox/credentials/twitter&quot")
  1. Now you can save oauth token for use in future sessions with R
save(my_oauth, file="my_oauth")

Installing the tweetscores package

The following code will install the tweetscores package, as well as all other R packages necessary for the functions to run.

toInstall <- c("ggplot2", "scales", "R2WinBUGS", "devtools", "yaml", "httr", "RJSONIO")
install.packages(toInstall, repos = "http://cran.r-project.org")
library(devtools)
install_github("pablobarbera/twitter_ideology/pkg/tweetscores")

Estimating the ideological positions of a US Twitter user

We can now go ahead and estimate ideology for any Twitter users in the US. In order to do so, the package includes pre-estimated ideology for political accounts and media outlets, so here we’re just replicating the second stage in the method – that is, estimating a user’s ideology based on the accounts they follow.

# load package
library(tweetscores)
# downloading friends of a user
user <- "p_barbera"
friends <- getFriends(screen_name=user, oauth="~/Dropbox/credentials/twitter")
## /Users/pablobarbera/Dropbox/credentials/twitter/oauth_token_32 
## 15  API calls left
## 1065 friends. Next cursor:  0 
## 14  API calls left
# estimate ideology with MCMC method
results <- estimateIdeology(user, friends)
## p_barbera follows 11 elites: nytimes maddow caitlindewey carr2n fivethirtyeight 
NickKristof nytgraphics nytimesbits NYTimeskrugman nytlabs thecaucus
## Chain 1
  |=================================================================| 100%
## Chain 2
  |=================================================================| 100%

Once we have this set of estimates, we can analyze them with a series of built-in functions.

# summarizing results
summary(results)
##        mean   sd  2.5%   25%   50%   75% 97.5% Rhat n.eff
## beta  -2.30 0.57 -3.37 -2.72 -2.25 -1.92 -1.26 1.02   200
## theta -1.78 0.30 -2.28 -1.99 -1.82 -1.59 -1.11 1.00   200
# assessing chain convergence using a trace plot
tracePlot(results, "theta")

# comparing with other ideology estimates
plot(results)

Faster ideology estimation

The previous function relies on a Metropolis-Hastings sampling algorithm to estimate ideology. However, we can also use Maximum Likelihood estimation to compute the distribution of the latent parameters. This method is much faster, since it’s not sampling from the posterior distribution of the parameters, but it will tend to give smaller standard errors. However, overall the results should be almost identical. (See here for the actual estimation functions for each of these two approaches.)

# faster estimation using maximum likelihood
results <- estimateIdeology(user, friends, method="MLE")
## p_barbera follows 11 elites: nytimes maddow caitlindewey carr2n fivethirtyeight 
NickKristof nytgraphics nytimesbits NYTimeskrugman nytlabs thecaucus
summary(results)
##        mean   sd  2.5%   25%   50%   75% 97.5% Rhat n.eff
## beta  -2.30 0.57 -3.37 -2.72 -2.25 -1.92 -1.26 1.02   200
## theta -1.78 0.30 -2.28 -1.99 -1.82 -1.59 -1.11 1.00   200

Estimation using correspondence analysis

One limitation of the previous method is that users need to follow at least one political account. To partially overcome this problem, in a recently published article in Psychological Science, we add a third stage to the model where we add additional accounts (not necessarily political) followed predominantely by liberal or by conservative users, under the assumption that if other users also follow this same set of accounts, they are also likely to be liberal or conservative. To reduce computational costs, we rely on correspondence analysis to project all users onto the latent ideological space (see Supplementary Materials), and then we normalize all the estimates so that they follow a normal distribution with mean zero and standard deviation one. This package also includes a function that reproduces the last stage in the estimation, after all the additional accounts have been added:

# estimation using correspondence analysis
results <- estimateIdeology2(user, friends)
## p_barbera follows 22 elites: andersoncooper, billclinton, BreakingNews, 
## cnnbrk, davidaxelrod, Gawker, HillaryClinton, maddow, MaddowBlog, mashable, mattyglesias,
## NateSilver538, NickKristof, nytimes, NYTimeskrugman, repjoecrowley, RonanFarrow, 
## SCOTUSblog, StephenAtHome, TheDailyShow, TheEconomist, UniteBlue
results
## [1] -1.06158

Additional functions

The package also contains additional functions that I use in my research, which I’m providing here in case they are useful:

  • scrapeCongressData is a scraper of the list of Twitter accounts for Members of the US congress from the unitedstates Github account.
  • getUsersBatch scrapes user information for more than 100 Twitter users from Twitter’s REST API.
  • getFollowers scrapes followers lists from Twitter’ REST API.
  • getTimeline downloads up to 3,200 most recent tweets for any given Twitter user.
  • CA is a modified version of the ca function in the ca package (available on CRAN) that computes simple correspondence analysis with a much lower memory usage.
  • supplementaryColumns and supplementaryRows takes additional columns of a follower matrix and projects them to the latent ideological space using the parameters of an already-fitted correspondence analysis model.

More Repositories

1

Rfacebook

Dev version of Rfacebook package: Access to Facebook API via R
R
350
star
2

voter-files

Python scripts to parse U.S. voter files
Python
119
star
3

streamR

Dev version of streamR package: Access to Twitter Streaming API via R
R
107
star
4

instaR

Dev version of instaR package: Access to Instagram API via R
R
104
star
5

social-media-workshop

Workshop: Collecting and Analyzing Social Media Data with R
HTML
104
star
6

scholarnetwork

Extract and Visualize Google Scholar Collaboration Networks
R
79
star
7

Rdataviz

Materials for workshop "Data Visualization with R and ggplot2"
R
75
star
8

pytwools

Python tools for the analysis of Twitter data
Python
72
star
9

quant3materials

PhD course: Quantitative Methods for Political Science III (NYU) -- Recitation Materials
R
53
star
10

workshop

Workshop: Scraping Twitter and Web Data Using R
R
47
star
11

eui-text-workshop

Methods workshop: Automated Text Analysis with R
HTML
34
star
12

echo_chambers

Replication materials for the paper "Tweeting from Left to Right: Is Online Political Communication More Than an Echo Chamber?"
R
34
star
13

POIR613-2019

Course materials: POIR 613 - Computational Social Science - USC Fall 2019
HTML
31
star
14

ECPR-SC105

ECPR Summer School: Big Data Analysis in the Social Sciences
HTML
31
star
15

POIR613-2021

Course materials: POIR 613 - Computational Social Science - USC Fall 2021
HTML
31
star
16

big-data-upf

RECSM-UPF Summer School: Social Media and Big Data Research
HTML
22
star
17

incivility-sage-open

Incivility classifier used in Theocharis et al (2020, Sage Open)
R
20
star
18

ECPR-SC104

ECPR Summer School: Automated Collection of Web and Social Data
HTML
15
star
19

NYU-AD-160J

Recitation materials for the NYU-Abu Dhabi course "Social Media and Political Participation", J-TERM 2014
R
14
star
20

text-analysis-vienna

HTML
13
star
21

social-media-upf

Summer School: Social Media and Big Data Research
HTML
13
star
22

POIR613

Course materials: POIR 613 - Computational Social Science - USC Fall 2022
HTML
12
star
23

twitter-incivility

Replication materials for the paper "A Bad Workman Blames His Tweets. The Consequences of Citizens' Uncivil Twitter Use when Interacting with Party Candidates"
R
9
star
24

POIR613-2017

2017 version of the PhD-level course POIR 613 - Computational Social Science
Makefile
8
star
25

SQL-workshop

Workshop: Querying large-scale online datasets - SQL and Google BigQuery
HTML
8
star
26

pablobarbera.github.com

Website
HTML
7
star
27

votes

Tracking Twitter Users Who "Tweet Their Vote"
R
3
star
28

icourts-workshop

Workshop materials: the use of Social Media for the study of International Courts
HTML
2
star
29

eitm

EITM Europe Summer Instute: Social Media Research
HTML
2
star
30

world-leaders-isq

Replication materials for the paper "The New Public Address System: Why Do World Leaders Adopt Social Media?"
Stata
2
star
31

zombies

Code for Zombie Outbreak Detector
R
1
star
32

corruption-psrm

Replication materials for the paper "Rooting Out Corruption or Rooting for Corruption? The Heterogeneous Electoral Consequences of Scandals"
R
1
star