• Stars
    star
    118
  • Rank 299,923 (Top 6 %)
  • Language
    Python
  • Created almost 11 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python Diffbot API Client

Python Diffbot API Client

Preface

Identify and extract the important parts of any web page in Python! This client currently supports calls to Diffbot's Automatic APIs and Crawlbot.

Installation To install activate a new virtual environment and run the following command:

$ pip install -r requirements.txt

Configuration

To run the example, you must first configure a working API token in config.py:

$ cp config.py.example config.py; vim config.py;

Then replace the string "SOME_TOKEN" with your API token. Finally, to run the example:

$ python example.py

Usage

Article API

An example call to the Article API:

diffbot = DiffbotClient()
token = "SOME_TOKEN"
version = 2
url = "http://shichuan.github.io/javascript-patterns/"
api = "article"
response = diffbot.request(url, token, api, version=2)

Product API

An example call to the Product API:

diffbot = DiffbotClient()
token = "SOME_TOKEN"
version = 2
url = "http://www.overstock.com/Home-Garden/iRobot-650-Roomba-Vacuuming-Robot/7886009/product.html"
api = "product"
response = diffbot.request(url, token, api, version=version)

Image API

An example call to the Image API:

diffbot = DiffbotClient()
token = "SOME_TOKEN"
version = 2
url = "http://www.google.com/"
api = "image"
response = diffbot.request(url, token, api, version=version)

Analyze API

An example call to the Analyze API:

diffbot = DiffbotClient()
token = "SOME_TOKEN"
version = 2
url = "http://www.twitter.com/"
api = "analyze"
response = diffbot.request(url, token, api, version=version)

Crawlbot API

To start a new crawl, specify a crawl name, seed URLs, and the API via which URLs should be processed. An example call to the Crawlbot API:

token = "SOME_TOKEN"
name = "sampleCrawlName"
seeds = "http://www.twitter.com/"
api = "analyze"
sampleCrawl = DiffbotCrawl(token,name,seeds=seeds,api=api)

Omit "seeds" and "api" to load an existing crawl, or create a crawl as a placeholder.

To check the status of a crawl:

sampleCrawl.status()

To update a crawl:

maxToCrawl = 100
upp = "diffbot"
sampleCrawl.update(maxToCrawl=maxToCrawl,urlProcessPattern=upp)

To delete or restart a crawl:

sampleCrawl.delete()
sampleCrawl.restart()

To download crawl data:

sampleCrawl.download() # returns JSON by default
sampleCrawl.download(data_format="csv")

To pass additional arguments to a crawl:

sampleCrawl = DiffbotCrawl(token,name,seeds,apiUrl,maxToCrawl=100,maxToProcess=50,notifyEmail="[email protected]")

Testing

First install the test requirements with the following command:

$ pip install -r test_requirements.txt

Currently there are some simple unit tests that mock the API calls and return data from fixtures in the filesystem. From the project directory, simply run:

$ nosetests

More Repositories

1

knowledge-net

KnowledgeNet: A Benchmark Dataset for Knowledge Base Population
Python
262
star
2

rss-anything

Transform any old website with a list of links into an RSS Feed
CSS
57
star
3

wikistatsextractor

Extract statistics from Wikipedia Dump files.
Java
26
star
4

diffbot-js-client

A Diffbot API client for Javascript
JavaScript
23
star
5

diffbot-go-client

A Diffbot API client for Go
Go
10
star
6

Diffbot-Documentation

A work in progress migration of Diffbot's Docs to Slate
HTML
10
star
7

diffbot-csharp-client

A Diffbot API client for C#
C#
9
star
8

enhance-client-python

A python client Enhance API
Python
7
star
9

diffbot-google-apps-client

A Diffbot API Demo from a Google Apps Spreadsheet
Gosu
7
star
10

diffbot-clojure-client

Clojure interface to the Diffbot API http://www.diffbot.com/
Clojure
6
star
11

docs

Diffbot Documentation Suite
JavaScript
5
star
12

diffbot-objc-client

A Diffbot API client for Objective C
Objective-C
5
star
13

diffbot-rust-client

A rust client library for the DiffbotAPI
Rust
4
star
14

diffbot-scala-client

A Diffbot API client for Scala
Scala
4
star
15

diffbot-r-client

A Diffbot client library for the R language.
R
3
star
16

diffbot-lua-client

A Diffbot APi client for Lua
Lua
3
star
17

diffbot-cpp-client

A Diffbot API client for C++
C++
3
star
18

primibuf

Memory-efficient Protocol Buffers Java code generator with primitive arrays
Java
3
star
19

jsonToCsv

A very fast and unopinionated JSON to CSV converter built with msgspec.
Vue
3
star
20

diffbot-delphi-client

A Diffbot API client for Delphi
Pascal
2
star
21

diffbot-excel-client

Using Diffbot from Microsoft Excel
2
star
22

diffbot-bash-client

A Diffbot API client for Bash
Shell
2
star
23

diffbot-powershell-client

A Diffbot API client for Powershell
PowerShell
1
star
24

spacy-diffbot-nlapi

spaCy wrapper for Diffbot Natural Language API
Python
1
star
25

enhance-client-java

Java language client to Diffbot Enhance API
Java
1
star
26

diffbot-fortran-client

A Diffbot API client for FORTRAN
Fortran
1
star