• This repository has been archived on 23/Dec/2018
  • Stars
    star
    234
  • Rank 170,662 (Top 4 %)
  • Language
    Python
  • License
    Other
  • Created over 10 years ago
  • Updated almost 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Provides content not accessible through the standard Amazon API

Amazon Scraper

A Hybrid Web scraper / API client. Supplements the standard Amazon API with web scraping functionality to get extra data. Specifically, product reviews.

Uses the Amazon Simple Product API to provide API accessible data. API search functions are imported directly into the amazon_scraper module.

Parameters are in the same style as the Amazon Simple Product API, which in turn uses Bottlenose style parameters. Hence the non-Pythonic parameter names (ItemId).

The AmazonScraper constructor will pass 'args' and 'kwargs' to Bottlenose (via Amazon Simple Product API). Bottlenose supports AWS regions, queries per second limiting, query caching and other nice features. Please view Bottlenose' API for more information on this.

The latest version of python-amazon-simple-product-api (1.5.0 at time of writing), doesn't support these arguemnts, only Region. If you require these, please use the latest code from their repository with the following command:

pip install git+https://github.com/yoavaviram/python-amazon-simple-product-api.git#egg=python-amazon-simple-product-api

Caveat

Amazon continually try and keep scrapers from working, they do this by:

  • A/B testing (randomly receive different HTML).
  • Huge numbers of HTML layouts for the same product categories.
  • Changing HTML layouts.
  • Moving content inside iFrames.

Amazon have resorted to moving more and more content into iFrames which this scraper can't handle. I envisage a time where most data will be inaccessible without more complex logic.

I've spent a long time trying to get these scrapers working and it's a never ending battle. I don't have the time to continually keep up the pace with Amazon. If you are interested in improving Amazon Scraper, please let me know (creating an issue is fine). Any help is appreciated.

Installation

pip install amazon_scraper

Dependencies

Examples

All Products All The Time

Create an API instance:

>>> from amazon_scraper import AmazonScraper
>>> amzn = AmazonScraper("put your access key", "secret key", "and associate tag here")

The creation function accepts 'kwargs' which are passed to 'bottlenose.Amazon' constructor:

>>> from amazon_scraper import AmazonScraper
>>> amzn = AmazonScraper("put your access key", "secret key", "and associate tag here", Region='UK', MaxQPS=0.9, Timeout=5.0)

Search:

>>> from __future__ import print_function
>>> import itertools
>>> for p in itertools.islice(amzn.search(Keywords='python', SearchIndex='Books'), 5):
>>>     print(p.title)
Learning Python, 5th Edition
Python Programming: An Introduction to Computer Science 2nd Edition
Python In A Day: Learn The Basics, Learn It Quick, Start Coding Fast (In A Day Books) (Volume 1)
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Python Cookbook

Lookup by ASIN/ItemId:

>>> p = amzn.lookup(ItemId='B00FLIJJSA')
>>> p.title
Kindle, Wi-Fi, 6" E Ink Display - for international shipment
>>> p.url
http://www.amazon.com/Kindle-Wi-Fi-Ink-Display-international/dp/B0051QVF7A/ref=cm_cr_pr_product_top

Batch Lookups:

>>> for p in amzn.lookup(ItemId='B0051QVF7A,B007HCCNJU,B00BTI6HBS'):
>>>     print(p.title)
Kindle, Wi-Fi, 6" E Ink Display - for international shipment
Kindle, 6" E Ink Display, Wi-Fi - Includes Special Offers (Black)
Kindle Paperwhite 3G, 6" High Resolution Display with Next-Gen Built-in Light, Free 3G + Wi-Fi - Includes Special Offers

By URL:

>>> p = amzn.lookup(URL='http://www.amazon.com/Kindle-Wi-Fi-Ink-Display-international/dp/B0051QVF7A/ref=cm_cr_pr_product_top')
>>> p.title
Kindle, Wi-Fi, 6" E Ink Display - for international shipment
>>> p.asin
B0051QVF7A

Product Ratings:

>>> p = amzn.lookup(ItemId='B00FLIJJSA')
>>> p.ratings
[8, 4, 6, 4, 13]

Alternative Bindings:

>>> p = amzn.lookup(ItemId='B000GRFTPS')
>>> p.alternatives
['B00IVM5X7E', '9163192993', '0899669433', 'B00IPXPQ9O', '1482998742', '0441444814', '1497344824']
>>> for asin in p.alternatives:
>>>     alt = amzn.lookup(ItemId=asin)
>>>     print(alt.title, alt.binding)
The King in Yellow Kindle Edition
The King in Yellow Unknown Binding
King in Yellow Hardcover
The Yellow Sign Audible Audio Edition
The King in Yellow MP3 CD
THE KING IN YELLOW Mass Market Paperback
The King in Yellow Paperback

Supplemental text not available via the API:

>>> p = amzn.lookup(ItemId='0441016685')
>>> p.supplemental_text
[u"Bob Howard is a computer-hacker desk jockey ... ", u"Lovecraft\'s Cthulhu meets Len Deighton\'s spies ... ", u"This dark, funny blend of SF and ... "]

Review API

View lists of reviews:

>>> p = amzn.lookup(ItemId='B0051QVF7A')
>>> rs = p.reviews()
>>> rs.asin
B0051QVF7A
>>> # print the reviews on this first page
>>> rs.ids
['R3MF0NIRI3BT1E', 'R3N2XPJT4I1XTI', 'RWG7OQ5NMGUMW', 'R1FKKJWTJC4EAP', 'RR8NWZ0IXWX7K', 'R32AU655LW6HPU', 'R33XK7OO7TO68E', 'R3NJRC6XH88RBR', 'R21JS32BNNQ82O', 'R2C9KPSEH78IF7']
>>> rs.url
http://www.amazon.com/product-reviews/B0051QVF7A/ref=cm_cr_pr_top_sort_recent?&sortBy=bySubmissionDateDescending
>>> # iterate over reviews on this page only
>>> for r in rs.brief_reviews:
>>>     print(r.id)
'R3MF0NIRI3BT1E'
'R3N2XPJT4I1XTI'
'RWG7OQ5NMGUMW'
...
>>> # iterate over all brief reviews on all pages
>>> for r in rs:
>>>     print(r.id)
'R3MF0NIRI3BT1E'
'R3N2XPJT4I1XTI'
'RWG7OQ5NMGUMW'
...

View detailed reviews:

>>> rs = amzn.reviews(ItemId='B0051QVF7A')
>>> # this will iterate over all reviews on all pages
>>> # each review will require a download as it is on a seperate page
>>> for r in rs.full_reviews():
>>>     print(r.id)
'R3MF0NIRI3BT1E'
'R3N2XPJT4I1XTI'
'RWG7OQ5NMGUMW'
...

Convert a brief review to a full review:

>>> rs = amzn.reviews(ItemId='B0051QVF7A')
>>> # this will iterate over all reviews on all pages
>>> # each review will require a download as it is on a seperate page
>>> for r in rs:
>>>     print(r.id)
>>>     fr = r.full_review()
>>>     print(fr.id)

Quickly get a list of all reviews on a review page using the all_reviews property. This uses the brief reviews provided on the review page to avoid downloading each review separately. As such, some information may not be accessible:

>>> p = amzn.lookup(ItemId='B0051QVF7A')
>>> rs = p.reviews()
>>> all_reviews_on_page = list(rs)
>>> len(all_reviews_on_page)
10
>>> r = all_reviews_on_page[0]
>>> r.title
'Fantastic device - pick your Kindle!'
>>> fr = r.full_review()
>>> fr.title
'Fantastic device - pick your Kindle!'

By ASIN/ItemId:

>>> rs = amzn.reviews(ItemId='B0051QVF7A')
>>> rs.asin
B0051QVF7A
>>> rs.ids
['R3MF0NIRI3BT1E', 'R3N2XPJT4I1XTI', 'RWG7OQ5NMGUMW', 'R1FKKJWTJC4EAP', 'RR8NWZ0IXWX7K', 'R32AU655LW6HPU', 'R33XK7OO7TO68E', 'R3NJRC6XH88RBR', 'R21JS32BNNQ82O', 'R2C9KPSEH78IF7']

For individual reviews use the review method:

>>> review_id = 'R3MF0NIRI3BT1E'
>>> r = amzn.review(Id=review_id)
>>> r.id
R3MF0NIRI3BT1E
>>> r.asin
B00492CIC8
>>> r.url
http://www.amazon.com/review/R3MF0NIRI3BT1E
>>> r.date
2011-09-29 18:27:14+00:00
>>> r.author
FreeSpirit
>>> r.text
Having been a little overwhelmed by the choices between all the new Kindles ... <snip>

By URL:

>>> r = amzn.review(URL='http://www.amazon.com/review/R3MF0NIRI3BT1E')
>>> r.id
R3MF0NIRI3BT1E

User Reviews API

This package also supports getting reviews written by a specific user.

Get reviews that a single author has created:

>>> ur = amzn.user_reviews(Id="A2W0GY64CJSV5D")
>>> ur.brief_reviews
>>> ur.name
>>> fr = list(ur.brief_reviews)[0].full_review()

Get reviews for a user, from a review object

>>> r = amzn.review(Id="R3MF0NIRI3BT1E")
>>> # we can get the reviews directly, or via the API with a URL or ID
>>> ur = r.user_reviews()
>>> ur = amzn.user_reviews(URL=r.author_reviews_url)
>>> ur = amzn.user_reviews(Id=r.author_id)
>>> ur.brief_reviews
>>> ur.name

Iterate over the current page's reviews:

>>> ur = amzn.user_reviews(Id="A2W0GY64CJSV5D")
>>> for r in ur.brief_reviews:
>>>     print(r.id)

Iterate over all author reviews:

>>> ur = amzn.user_reviews(Id="A2W0GY64CJSV5D")
>>> for r in ur:
>>>     print(r.id)

Authors

More Repositories

1

Pyrr

3D mathematical functions using NumPy
Python
400
star
2

PyGLy

Pure Python OpenGL framework using PyOpenGL
Python
37
star
3

Pyglet

Fork of Pyglet to provide OpenGL Core functionality on Mac OS-X. Please use the official repository unless you require these changes.
Python
25
star
4

cyglfw3

Cython bindings for GLFW3
Python
20
star
5

OMGL

Pythonic OpenGL Bindings
Python
16
star
6

brutaldoom

Scripts for launching Brutal Doom
Shell
12
star
7

PyMesh

Loads various 3D model formats into simple data structures
Python
8
star
8

FHDB-Noobie-Package

Custom FreeMCBoot / FreeHDBoot configurations
5
star
9

pywebview-demo

Pywebview + Flask + SocketIO + Brython
JavaScript
5
star
10

Shock

Shock C++ Platform Abstraction Library
C++
4
star
11

Py3D

Complete Python 3D toolkit
Shell
4
star
12

CCAnimatedTMXTiledMap

Animation support for Cocos2D iPhone TMX tile maps.
Objective-C
3
star
13

bast

A Pythonic Modern OpenGL Engine
Python
3
star
14

imraylib

Imgui / Raylib integration
Python
3
star
15

SimpleCpparse

Basic CPP header parser
Python
2
star
16

flask_skeleton

Flask + SQLAlchemy + Webpack
Python
2
star
17

GLETools

Unofficial fork of GLETools
Python
2
star
18

obj2xml

Easy XML document creator in Python
Python
2
star
19

dfe_to_tp

Converts DarkFunction Editor to pixi / TexturePacker spritesheet format
Python
2
star
20

qbasic-fractal

Mandelbrot in QBasic
VBA
2
star
21

nix-gpu-passthrough

GPU pass-through notes for Nix
Python
2
star
22

atari-st-video-pcb

ATARI ST Video/Audio adapter PCB
2
star
23

nix-nim

Example project showing how to use Nim unstable in a Nix shell
Nim
2
star
24

schematics-tree

Schematics Model registry with real-time manipulation and inspection over HTTP
Python
1
star
25

redistil

Declarative data types, optimised for Redis.
Python
1
star
26

Razorback

Pure Python OpenGL framework using PyGLy as a base
Python
1
star
27

dfe_to_easel

Converts DarkFunction Editor to Easel spritesheet format
Python
1
star
28

gol

Conway's Game of Life
Python
1
star
29

pyfilesystem

A fork of PyFilesystem with a sane S3 implementation.
Python
1
star
30

nix-configs

Configs and guides for Nixos
Nix
1
star
31

jaweson

A safe, modular, format agnostic, serialiser for Python - supports JSON, MsgPack
Python
1
star
32

ComPy

Component framework for Python
Python
1
star
33

modelus

Simple declarative data objects (ORM-like) with multiple backends
Python
1
star
34

exodus

A light-weight, storage agnostic, data migration framework
Python
1
star
35

nix-shells

Example nix shells for different languages / use cases
Nix
1
star
36

blessed-input

Sane terminal keyboard input handling in Blessed
Python
1
star
37

PyQueue

Simple Queue objects for Python
Python
1
star
38

environment

Configuration files to setup a comfortable computing environment
Vim Script
1
star
39

condawrapper

virtualenvwrapper-like commands for conda
Shell
1
star
40

Phoenetics

Uses dictionary of values to calculate value of a word and find comparable words
Python
1
star
41

raylib-py-flat

Flattens the `raylibpy` modules for easier import
Nix
1
star
42

CCMenuAlignment

Alignment methods for Cocos2d CCMenu and CCMenuAdvanced classes.
Objective-C
1
star
43

envboot

Scripts to bootstrap new computing environments
Shell
1
star
44

scrapy_romsuniverse

Scrapy spider for Roms Universe
Python
1
star
45

heroku-rq-dashboard

Out-of-the-box Heroku configuration for RQ Dashboard with Basic HTTP Auth
1
star
46

pygame_raytracer

Demonstrating a software Ray tracer in PyGame
Python
1
star
47

aurora

Australian Antarctic Webcam Scraper
Python
1
star