• Stars
    star
    384
  • Rank 111,726 (Top 3 %)
  • Language
    R
  • License
    Other
  • Created almost 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

๐Ÿค– R package for detecting Twitter bots via machine learning

tweetbotornot

lifecycle Travis build status Coverage status

An R package for classifying Twitter accounts as bot or not.

Features

Uses machine learning to classify Twitter accounts as bots or not bots. The default model is 93.53% accurate when classifying bots and 95.32% accurate when classifying non-bots. The fast model is 91.78% accurate when classifying bots and 92.61% accurate when classifying non-bots.

Overall, the default model is correct 93.8% of the time.

Overall, the fast model is correct 91.9% of the time.

Install

Install from CRAN:

## install from CRAN
install.packages("tweetbotornot")

Install the development version from Github:

## install remotes if not already
if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}

## install tweetbotornot from github
devtools::install_github("mkearney/tweetbotornot")

API authorization

Users must be authorized in order to interact with Twitterโ€™s API. To setup your machine to make authorized requests, youโ€™ll either need to be signed into Twitter and working in an interactive session of Rโ€“the browser will open asking you to authorize the rtweet client (rstats2twitter)โ€“or youโ€™ll need to create an app (and have a developer account) and your own API token. The latter has the benefit of (a) having sufficient permissions for write-acess and DM (direct messages) read-access levels and (b) more stability if Twitter decides to shut down [@kearneymw](https://twitter.com/kearneymw)โ€™s access to Twitter (I try to be very responsible these days, but Twitter isnโ€™t always friendly to academic use cases). To create an app and your own Twitter token, see these instructions provided in the rtweet package.

Usage

Thereโ€™s one function tweetbotornot() (technically thereโ€™s also botornot(), but it does the same exact thing). Give it a vector of screen names or user IDs and let it go to work.

## load package
library(tweetbotornot)

## select users
users <- c("realdonaldtrump", "netflix_bot",
  "kearneymw", "dataandme", "hadleywickham",
  "ma_salmon", "juliasilge", "tidyversetweets", 
  "American__Voter", "mothgenerator", "hrbrmstr")

## get botornot estimates
data <- tweetbotornot(users)

## arrange by prob ests
data[order(data$prob_bot), ]
#> # A tibble: 11 x 3
#>    screen_name     user_id            prob_bot
#>    <chr>           <chr>                 <dbl>
#>  1 hadleywickham   69133574            0.00754
#>  2 realDonaldTrump 25073877            0.00995
#>  3 kearneymw       2973406683          0.0607 
#>  4 ma_salmon       2865404679          0.150  
#>  5 juliasilge      13074042            0.162  
#>  6 dataandme       3230388598          0.227  
#>  7 hrbrmstr        5685812             0.320  
#>  8 netflix_bot     1203840834          0.978  
#>  9 tidyversetweets 935569091678691328  0.997  
#> 10 mothgenerator   3277928935          0.998  
#> 11 American__Voter 829792389925597184  1.000  

Integration with rtweet

The botornot() function also accepts data returned by rtweet functions.

## get most recent 100 tweets from each user
tmls <- get_timelines(users, n = 100)

## pass the returned data to botornot()
data <- botornot(tmls)

## arrange by prob ests
data[order(data$prob_bot), ]
#> # A tibble: 11 x 3
#>    screen_name     user_id            prob_bot
#>    <chr>           <chr>                 <dbl>
#>  1 hadleywickham   69133574            0.00754
#>  2 realDonaldTrump 25073877            0.00995
#>  3 kearneymw       2973406683          0.0607 
#>  4 ma_salmon       2865404679          0.150  
#>  5 juliasilge      13074042            0.162  
#>  6 dataandme       3230388598          0.227  
#>  7 hrbrmstr        5685812             0.320  
#>  8 netflix_bot     1203840834          0.978  
#>  9 tidyversetweets 935569091678691328  0.997  
#> 10 mothgenerator   3277928935          0.998  
#> 11 American__Voter 829792389925597184  1.000  

fast = TRUE

The default [gradient boosted] model uses both users-level (bio, location, number of followers and friends, etc.) and tweets-level (number of hashtags, mentions, capital letters, etc. in a userโ€™s most recent 100 tweets) data to estimate the probability that users are bots. For larger data sets, this method can be quite slow. Due to Twitterโ€™s REST API rate limits, users are limited to only 180 estimates per every 15 minutes.

To maximize the number of estimates per 15 minutes (at the cost of being less accurate), use the fast = TRUE argument. This method uses only users-level data, which increases the maximum number of estimates per 15 minutes to 90,000! Due to losses in accuracy, this method should be used with caution!

## get botornot estimates
data <- botornot(users, fast = TRUE)

## arrange by prob ests
data[order(data$prob_bot), ]
#> # A tibble: 11 x 3
#>    screen_name     user_id            prob_bot
#>    <chr>           <chr>                 <dbl>
#>  1 hadleywickham   69133574            0.00185
#>  2 kearneymw       2973406683          0.0415 
#>  3 ma_salmon       2865404679          0.0661 
#>  4 dataandme       3230388598          0.0965 
#>  5 juliasilge      13074042            0.112  
#>  6 hrbrmstr        5685812             0.121  
#>  7 realDonaldTrump 25073877            0.368  
#>  8 netflix_bot     1203840834          0.978  
#>  9 tidyversetweets 935569091678691328  0.998  
#> 10 mothgenerator   3277928935          0.999  
#> 11 American__Voter 829792389925597184  0.999  

NOTE

In order to avoid confusion, the package was renamed from โ€œbotrnotโ€ to โ€œtweetbotornotโ€ in June 2018. This package should not be confused with the botornot application.

More Repositories

1

rstudiothemes

A curated list of RStudio themes found on Github
R
217
star
2

resist_oped

๐Ÿ•ต๐Ÿฝโ€โ™€๏ธ Identifying the author behind New York Timeโ€™s op-ed from inside the Trump White House.
R
201
star
3

tidyversity

๐ŸŽ“ Tidy tools for academics
R
166
star
4

textfeatures

๐Ÿ‘ทโ€โ™‚๏ธ A simple package for extracting useful features from character objects ๐Ÿ‘ทโ€โ™€๏ธ
R
166
star
5

shinyapps_links

A collection of Shiny applications (links shared on Twitter)
R
135
star
6

pkgverse

๐Ÿ“ฆ๐Ÿ”ญ๐ŸŒ  Create your own universe of packages ร  la tidyverse
R
119
star
7

tweetbotornot2

๐Ÿ”๐Ÿฆ๐Ÿค– Detect Twitter Bots!
R
88
star
8

presidential_election_county_results_2016

๐Ÿ presidential_election_county_results_2016
R
59
star
9

kaggler

๐Ÿ API client for Kaggle
R
55
star
10

dapr

โ˜๐Ÿผ๐Ÿ‘‰๐Ÿผ๐Ÿ‘‡๐Ÿผ๐Ÿ‘ˆ๐Ÿผ Dependency-free purrr-like apply/map/iterate functions
R
54
star
11

rreddit

๐ซโŸ‹ Get Reddit data
R
51
star
12

rtweet-workshop

Slides and code for the rtweet workshop
R
44
star
13

hexagon

โ—€๏ธโนโ–ถ๏ธ R package for creating hexagon shaped xy data frames.
R
42
star
14

trumptweets

Download data on all of Donald Trump's (@realDonaldTrump) tweets
R
40
star
15

uslides

Rmarkdown template for pretty university-themed beamer presentations.
HTML
37
star
16

tbltools

๐Ÿ—œ๐Ÿ”ข Tools for Working with Tibbles
R
35
star
17

rstudioconf_tweets

๐Ÿ–ฅ A repository for tracking tweets about rstudio::conf
R
33
star
18

wactor

Word Factor Vectors
R
32
star
19

hex-stickers

๐Ÿ—ƒ Hex stickers for my R pkgs
R
28
star
20

readthat

Read Text Data
R
27
star
21

nytimes

nytimes: Interacting with New York TImes APIs
R
25
star
22

rtweet.download

{rtweet} helpers for automating large or time-consuming downloads
R
24
star
23

rmd2jupyter

Convert Rmd (rmarkdown) to ipynb (Jupyter notebook)
R
23
star
24

nicar_tworkshop

Slides for #NICAR18 workshop on collecting and analyzing Twitter data
R
23
star
25

tidyreg

๐ŸŽ“ Tidy regression tools for academics
R
22
star
26

driven-snow

โ„๏ธA light, bare-bones custom theme for Rstudioโ„๏ธ
R
21
star
27

viewtweets

๐Ÿ™ˆ๐Ÿต View tweets (timelines, favorites, searches) in Rstudio ๐Ÿต๐Ÿ™ˆ
R
21
star
28

newsAPI

API wrapper/R client for accessing https://newsapi.org
R
20
star
29

mizzourahmd

๐Ÿ˜Ž A clean and stylish template for rmarkdown ๐Ÿฏ
R
20
star
30

xaringan_slides

๐Ÿ“บ Links to HTML5 presentations made using the R package {xaringan}.
20
star
31

funique

โŒš๏ธ A faster unique() function
R
19
star
32

tfse

๐Ÿ›  Useful R functions for various things
R
18
star
33

chr

๐Ÿ”ค Lightweight R package for manipulating [string] characters
R
18
star
34

tidymlm

๐ŸŽ“ Tidy multilevel modeling tools for academics
R
17
star
35

cspan_data

A repo for tracking the number of followers of Congress, the Cabinet, and Governors
R
16
star
36

googler

googler: Google from the R Console
R
14
star
37

tidysem

๐ŸŽ“ Tidy SEM tools for academics
R
14
star
38

ig

๐Ÿ–ผ A minimal R client for interacting with Instagramโ€™s public API
R
14
star
39

printbl

Printable Tibbles
R
14
star
40

stat

Course website for JOURN 8016: Advanced Quantitative Research Methods
HTML
13
star
41

nyt

๐Ÿ“ฐ๐Ÿ—ž New York Times data
R
13
star
42

wibble

Web Data Frames
R
13
star
43

reflowdoc

ไท— Hard-Wrapping Rstudio Add-In ไท—
R
11
star
44

learnRvideos

๐Ÿ“ผ Videos for learning about R
10
star
45

shouldbeverified

Predict Whether Twitter Users Should Be Verified
R
10
star
46

mocktwitter

๐Ÿง๐Ÿฆ Generate HTML pages for Twitter statuses.
HTML
10
star
47

data-science-tenure

Making data science tools count toward tenure ๐Ÿ‘ฉโ€๐Ÿซ๐Ÿ‘จโ€๐Ÿซ
TeX
10
star
48

data-scribers

{data scribers} is a collection of posts about data science. And unlike other content aggregating sites, this one encourages people to visit the blog's actual site.
CSS
10
star
49

quant

Course Website Repo for JOURN 8006: Quantitative Research Methods in Journalism
HTML
10
star
50

twitter-datasets

9
star
51

dict

Word-Based Dictionaries for Natural Language
R
9
star
52

covid19

API Wrapper for COVID Tracking Project
R
9
star
53

mitchhedberg

An #rstats Ode to Hedberg
R
9
star
54

congress_tweets

Collecting tweets from members of Congress
R
9
star
55

plotting-county-election-results

๐Ÿ‡บ๐Ÿ‡ธ๐Ÿ Draw a beautiful county-level election results map with only a few lines of code
R
8
star
56

pkguse

Take Inventory of Package Use
R
8
star
57

googleapis

R client for accessing Google Cloud Natural Language APIs
R
8
star
58

rstatsresources

๐Ÿ”—๐Ÿ”— A curated collection of links about rstatsresources
8
star
59

pytweet

๐Ÿฅ API Wrapper for Twitterโ€™s REST and stream APIs
Python
7
star
60

gh.com

Easily scrape Github
R
7
star
61

dowhen

๐Ÿคธโ€โ™€๏ธ Do something when something else happens โฐ
R
7
star
62

alpacar

๐Ÿค–๐Ÿ’น๐Ÿ’ฐ Algorithmic Stock Trading with Alpaca's Market API
R
7
star
63

stanford-sna

Material copied from http://sna.stanford.edu/
R
7
star
64

CV

My CV repo
TeX
6
star
65

mikewk.com

source code for my personal website
CSS
6
star
66

fivethirtyeight-approval

6
star
67

cronjob

Manage Cron Jobs
R
6
star
68

r-bloggers

[Tweet bot] R script tweeting new links to R-bloggers posts
R
6
star
69

JOURN_8006_Quant

๐Ÿ“™ Course repository for JOURN 8006: Quantitative Research Methods in Journalism
R
5
star
70

rtweet_citations

๐Ÿ“ tracking rtweet citations
TeX
5
star
71

fml

๐Ÿ“ File Management and Location Tools ๐Ÿ“‚
R
5
star
72

opinion.classifier

What the Package Does (One Line, Title Case)
R
5
star
73

iphub

Lookup safeness of IP addresses via iphub
R
5
star
74

ncaa_bball_data

ncaa basketball team level data with tourney outcomes
R
5
star
75

wordword

R
4
star
76

attrbl

A tidy approach to attributes
R
4
star
77

rmdees

Rmd Helpers
R
4
star
78

jeopboty

My Jeopardy Twitter bot
R
4
star
79

pbr

R
4
star
80

weddingposter

TeX
4
star
81

qualtricks

๐Ÿ“๐Ÿคก๐Ÿ›  Tools for Working with Qualtrics Data
R
4
star
82

useapi

๐Ÿ“ฉ๐Ÿ“จ A workflow for building API wrapper/client packages in R.
R
4
star
83

inaug_crowd_size

Plot of inaugural crowd sizes
R
4
star
84

name2sex

โšค Get sex (female percent) estimates based on first names
R
3
star
85

rstudioconf19-machine-learning

HTML
3
star
86

tidycor

๐ŸŽ“ Tidy correlation tools for academics
R
3
star
87

journ-tweets

๐Ÿ•ต Tracking tweets from and to journalists
R
3
star
88

NCA18

Tracking and analyzing tweets about the 2018 National Communication Conference
R
3
star
89

makelinkrepo

๐Ÿ”—๐Ÿ”— Create link repositories and share them on Github
R
3
star
90

do

๐ŸŒŠ R client for DigitalOcean's API
R
3
star
91

lop

Shortcuts for Web Scraping and Data Wrangling
R
3
star
92

shiny-tweetbotornot2

R
3
star
93

warcraft

Warcraft mode for R
R
3
star
94

NCA17

Data collection and visualization of #NCA17 tweets
R
3
star
95

cngtweets

๐Ÿ›๐Ÿฆ Screen names of members of U.S. Congress
3
star
96

datacenter

Create, Add, and Update Centralized Data
R
2
star
97

whereabouts

โ“๐ŸŒโ† ๐Ÿ”คโ†’๐ŸŒŽโ“ {whereabouts}: Find Your Whereabouts
R
2
star
98

dataviz

Data Visualization Tools
R
2
star
99

smartread

๐Ÿ”Ž๐Ÿ“˜ A smarter and simpler way to read data from common file types
R
2
star
100

rstudioconf_talks

Notes and links from rstudio::conf talks
R
2
star