• Stars
    star
    162
  • Rank 232,284 (Top 5 %)
  • Language
    CoffeeScript
  • License
    GNU General Publi...
  • Created over 9 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

scrape public instagram data w/out API access

Instagram Screen Scrape

Build Status NPM version NPM license

A tool for scraping public data from Instagram, without needing to get permission from Instagram. It can (theoretically) scrape anything that a non-logged-in user can see. But, right now it only supports getting posts for a given username or comments for a given post.

Example

CLI

The CLI operates entirely over STDOUT, and will output posts as it scrapes them. The following example is truncated because the output of the real command is obviously very long... it will end with a closing bracket (making it valid JSON) if you see the full output.

$ instagram-screen-scrape posts --username carrotcreative
[{"id":"0toxcII4Eo","username":"carrotcreative","time":1427420497,"type":"image","likes":82,"comments":3,"text":"Our CTO, @kylemac, speaking on the #LetsTalkCulture panel tonight @paperlesspost.","media":"https://scontent.cdninstagram.com/hphotos-xaf1/t51.2885-15/e15/11055816_398297847022038_803876945_n.jpg"},
{"id":"0qPcnuI4Pr","username":"carrotcreative","time":1427306556,"type":"image","likes":80,"comments":4,"text":"#bitchesbebakin took it to another level today for @nporteschaikin and @slang800's #Carrotversaries today.","media":"https://scontent.cdninstagram.com/hphotos-xaf1/t51.2885-15/e15/10959049_1546104325652055_1320782099_n.jpg"},
{"id":"0WLnjlo4Ft","username":"carrotcreative","time":1426633460,"type":"image","likes":61,"comments":1,"text":"T-shirts speak louder than words. Come find us @sxsw.","media":"https://scontent.cdninstagram.com/hphotos-xfa1/t51.2885-15/e15/11032904_789885121108568_378908081_n.jpg"},

We can also scrape comments:

$ instagram-screen-scrape comments --post 0qPcnuI4Pr
[{"id":"948651188581269518","username":"johnlustina","time":1427308055,"text":"@margeauxlustina"},
{"id":"948682633420963943","username":"rita_xo","time":1427311804,"text":"👌@emilykalen"},
{"id":"948734454231433861","username":"david_berkhin","time":1427317981,"text":"looks so good!"},
{"id":"948824521079751272","username":"k.kate","time":1427328718,"text":"Macarons or a Petri dish full of cells? ¯\\_(ツ)_/¯"}]

By default, there is 1 line per post, making it easy to pipe into other tools. The following example uses wc -l to count how many posts are returned. As you can see, I don't post much.

$ instagram-screen-scrape posts -u slang800 | wc -l
2

JavaScript Module

The following example is in CoffeeScript.

{InstagramPosts} = require 'instagram-screen-scrape'

# create the stream
streamOfPosts = new InstagramPosts(username: 'slang800')

# do something interesting with the stream
streamOfPosts.on('data', (post) ->
  # since it's an object-mode stream, we get objects from it and don't need to
  # parse JSON or anything

  # the time field is represented in UNIX time
  time = new Date(post.time * 1000)

  # output something like "slang800's post from 4/5/2015 got 1 like(s), and 0
  # comment(s)"
  console.log "slang800's post from #{time.toLocaleDateString()} got
  #{post.likes} like(s), and #{post.comments} comment(s)"
)

The following example is the same as the last one, but in JavaScript.

var InstagramPosts, streamOfPosts;
InstagramPosts = require('instagram-screen-scrape').InstagramPosts;

streamOfPosts = new InstagramPosts({
  username: 'slang800'
});

streamOfPosts.on('data', function(post) {
  var time = new Date(post.time * 1000);
  console.log([
    "slang800's post from ",
    time.toLocaleDateString(),
    " got ",
    post.likes,
    " like(s), and ",
    post.comments,
    " comment(s)"
  ].join(''));
});

And we can scrape comments in a similar manner (shown in CoffeeScript):

{InstagramComments} = require 'instagram-screen-scrape'

streamOfComments = new InstagramComments(post: '0qPcnuI4Pr')

# do something interesting with the stream
streamOfComments.on('data', (comment) ->
  # the time field is represented in UNIX time
  time = new Date(comment.time * 1000)

  console.log "#{comment.username} commented on #{time.toLocaleDateString()}:
  #{comment.text}"
)

Why?

The fact that Instagram requires an app to be registered just to access the data that is publicly available on their site is excessively controlling. Scripts should be able to consume the same data as people, and with the same level of authentication. Sadly, Instagram doesn't provide an open, structured, and machine readable API.

So, we're forced to use a method that Instagram cannot effectively shut down without harming themselves: scraping their user-facing site.

Caveats

  • This is probably against the Instagram TOS, so don't use it if that sort of thing worries you.
  • Whenever Instagram updates certain parts of their front-end this scraper will need to be updated to support the new markup.
  • You can't scrape protected accounts or get engagement rates / impression counts (cause it's not public duh).

More Repositories

1

tidy-markdown

Beautify Markdown, fixing formatting mistakes and standardizing syntax
CoffeeScript
70
star
2

instagram-id-to-url-segment

Convert Instagram post IDs into Instagram links, algorithmically
CoffeeScript
68
star
3

twitter-screen-scrape

scrape public twitter data w/out API access
CoffeeScript
46
star
4

atom-tidy-markdown

Fix ugly markdown.
CoffeeScript
32
star
5

jade-book

CSS
28
star
6

twitterFetcher

Fetch tweets from twitter on the client-side without oAuth (pure JavaScript)
CoffeeScript
12
star
7

instagram-scrape-account-stats

CoffeeScript
11
star
8

ipfs-gateway-dmca-requests

Python
8
star
9

torrent-scraper

A little tool for scraping torrents in bulk, using RabbitMQ & Docker
CoffeeScript
8
star
10

proton

A Neutron-inpsired UI for Atom.
CSS
6
star
11

valentines-card

A simple valentines day card made with CSS3
CoffeeScript
5
star
12

vine-screen-scrape

scrape public vine data w/out API access
CoffeeScript
4
star
13

dotfiles

Custom Linux config files, managed with GNU Stow
Shell
4
star
14

twitter-following-editor

programmatically follow & unfollow people on Twitter w/out API access
CoffeeScript
4
star
15

proton-bat

A Proton UI theme inspired by Shopify's BatmanJS.
CSS
3
star
16

proton-kai

A monokai syntax theme for Atom to work with Proton UI.
CSS
3
star
17

proton-light

A light, airy version of Proton UI.
CSS
2
star
18

facebook-access-token

poor man's xAuth
CoffeeScript
2
star
19

jkniselylandscaping.com

CSS
2
star
20

slang.cx

2
star
21

linear-calender

a linear calendar layout implemented in a web-page
JavaScript
2
star
22

twitter-scrape-account-stats

CoffeeScript
2
star
23

fortuna

JavaScript
2
star
24

php-array-syntax-converter

Bidirectional conversion between the old and new array syntaxes in PHP
PHP
2
star
25

bay-area-quarter-life

An attempt at collecting all of the "Bay Area Quarter Life" comics.
2
star
26

fobject

A simple promise-based wrapper for file operations that treats files as objects.
CoffeeScript
2
star
27

vine-id-to-url-segment

Convert Vine video IDs into Vine permalinks, algorithmically
CoffeeScript
2
star
28

do-not-survey-list

A list of users who do not want to participate in surveys with regards to their activity on GitHub
1
star
29

sha256

JavaScript
1
star
30

gh-fixed

Improvements to the GitHub UI
CoffeeScript
1
star
31

ergodox-keymap

My ErgoDox keymap
C
1
star
32

vine-scrape-account-stats

CoffeeScript
1
star
33

proton-framer

A Proton Light UI syntax theme inspired by Framer JS.
CSS
1
star
34

couchdb-client

Interact with CouchDB through the terminal using streams
CoffeeScript
1
star
35

fdupes-dir-selector

Read a fdupes-style file group list and print out files contained in a given set of directories, that can be deleted without data loss.
CoffeeScript
1
star
36

config-schema

A lightweight wrapper for configuration options using JSON schema
CoffeeScript
1
star
37

concurrent-transform-stream

CoffeeScript
1
star
38

alpine-transmission

1
star
39

csv-performance-tests

there are a bunch of CSV parsers, so this is an attempt to compare them
JavaScript
1
star
40

chazsouthard.com

The portfolio of Chaz Southard
JavaScript
1
star
41

md5

JavaScript
1
star