• Stars
    star
    501
  • Rank 87,399 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Social media scraping / data collection library for Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs

socialreaper

Downloads Gitter

socialreaper is a Python 3.6+ library that scrapes Facebook, Twitter, Reddit, Youtube, Pinterest, and Tumblr.

Documentation

Not a programmer? Try the GUI

Install

pip3 install socialreaper

Examples

For version 0.3.0 only

pip3 install socialreaper==0.3.0

Facebook

Get the comments from McDonalds' 1000 most recent posts

from socialreaper import Facebook

fbk = Facebook("api_key")

comments = fbk.page_posts_comments("mcdonalds", post_count=1000, 
    comment_count=100000)

for comment in comments:
    print(comment['message'])

Twitter

Save the 500 most recent tweets from the user @realDonaldTrump to a csv file

from socialreaper import Twitter
from socialreaper.tools import to_csv

twt = Twitter(app_key="xxx", app_secret="xxx", oauth_token="xxx", 
    oauth_token_secret="xxx")
    
tweets = twt.user("realDonaldTrump", count=500, exclude_replies=True, 
    include_retweets=False)
    
to_csv(list(tweets), filename='trump.csv')

Reddit

Get the top 10 comments from the top 50 threads of all time on reddit

from socialreaper import Reddit
from socialreaper.tools import flatten

rdt = Reddit("xxx", "xxx")
 
comments = rdt.subreddit_thread_comments("all", thread_count=50, 
    comment_count=500, thread_order="top", comment_order="top", 
    search_time_period="all")
    
# Convert nested dictionary into flat dictionary
comments = [flatten(comment) for comment in comments]

# Sort by comment score
comments = sorted(comments, key=lambda k: k['data.score'], reverse=True)

# Print the top 10
for comment in comments[:9]:
    print("###\nUser: {}\nScore: {}\nComment: {}\n".format(comment['data.author'], comment['data.score'], comment['data.body']))

Youtube

Get the comments containing the strings prize, giveaway from youtube channel mkbhd's videos

from socialreaper import Youtube

ytb = Youtube("api_key")

channel_id = ytb.api.guess_channel_id("mkbhd")[0]['id']

comments = ytb.channel_video_comments(channel_id, video_count=500, 
    comment_count=100000, comment_text=["prize", "giveaway"], 
    comment_format="plainText")
    
for comment in comments:
    print(comment)

CSV export

You can export a list of dictionaries using socialreaper's CSV class

from socialreaper import Facebook
from socialreaper.tools import CSV

fbk = Facebook("api_key")
posts = list(fbk.page_posts("mcdonalds"))
CSV(posts, file_name='mcdonalds.csv')

More Repositories

1

instamancer

Scrape Instagram's API with Puppeteer
TypeScript
366
star
2

reaper

Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Python
358
star
3

insta-scrape

Scrape Instagram
28
star
4

instaphyte

Fast and simple Instagram hashtag and location scraper
Python
19
star
5

depot

Object storage microservice. Like minio but minnier.
Go
9
star
6

topwords

A list of the top 3 million+ english words in project gutenberg.
7
star
7

WatchTheThrones

A map of current Game of Thrones downloaders on the Bittorrent DHT
TypeScript
6
star
8

venmap

Download and create a network graph from public venmo data
Python
6
star
9

csveditor

Remove multiple columns and limit rows in csv files
Python
5
star
10

twitterreapertutorial

HTML
4
star
11

reaper-site

Website
HTML
4
star
12

redditlive

Live graph of new subreddit posts
JavaScript
3
star
13

matrix-validator

HTML
2
star
14

aussiepirates

A map of Australian Bittorrent Downloaders
JavaScript
2
star
15

NotEscapeRoom

Photon Folly
JavaScript
2
star
16

EvacuMate

Evacuating Mate. 2016 Govhack prize winner
JavaScript
2
star
17

socialreaper-cli

Just a fun plaything with Google's fire
Python
2
star
18

daq

DIY AirGradient Air Quality Monitor with AWS IoT Core and Amazon Timestream
C++
2
star
19

routerreboot

Reboot my cheap router
TypeScript
2
star
20

GitZibit

Explore git projects
HTML
2
star
21

cdt

Close Discarded Tabs
JavaScript
2
star
22

todepot

Go
1
star
23

instagram-speed-test

Test the speed of Instagram scraping tools
HTML
1
star
24

wpm

Simple WPM calculator
HTML
1
star
25

election2019

Social media accounts for the 2019 Australian Federal Election
Jupyter Notebook
1
star
26

testpack

Python
1
star
27

d3pattern

TypeScript
1
star
28

unininja-site

INFS3202 Project
HTML
1
star
29

election-results

View australian election results
JavaScript
1
star