• Stars
    star
    2,098
  • Rank 22,008 (Top 0.5 %)
  • Language
    Python
  • License
    Other
  • Created over 11 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Asynchronous Python HTTP Requests for Humans using Futures

Asynchronous Python HTTP Requests for Humans

https://travis-ci.org/ross/requests-futures.svg?branch=master

Small add-on for the python requests http library. Makes use of python 3.2's concurrent.futures or the backport for prior versions of python.

The additional API and changes are minimal and strives to avoid surprises.

The following synchronous code:

from requests import Session

session = Session()
# first requests starts and blocks until finished
response_one = session.get('http://httpbin.org/get')
# second request starts once first is finished
response_two = session.get('http://httpbin.org/get?foo=bar')
# both requests are complete
print('response one status: {0}'.format(response_one.status_code))
print(response_one.content)
print('response two status: {0}'.format(response_two.status_code))
print(response_two.content)

Can be translated to make use of futures, and thus be asynchronous by creating a FuturesSession and catching the returned Future in place of Response. The Response can be retrieved by calling the result method on the Future:

from requests_futures.sessions import FuturesSession

session = FuturesSession()
# first request is started in background
future_one = session.get('http://httpbin.org/get')
# second requests is started immediately
future_two = session.get('http://httpbin.org/get?foo=bar')
# wait for the first request to complete, if it hasn't already
response_one = future_one.result()
print('response one status: {0}'.format(response_one.status_code))
print(response_one.content)
# wait for the second request to complete, if it hasn't already
response_two = future_two.result()
print('response two status: {0}'.format(response_two.status_code))
print(response_two.content)

By default a ThreadPoolExecutor is created with 8 workers. If you would like to adjust that value or share a executor across multiple sessions you can provide one to the FuturesSession constructor.

from concurrent.futures import ThreadPoolExecutor
from requests_futures.sessions import FuturesSession

session = FuturesSession(executor=ThreadPoolExecutor(max_workers=10))
# ...

As a shortcut in case of just increasing workers number you can pass max_workers straight to the FuturesSession constructor:

from requests_futures.sessions import FuturesSession
session = FuturesSession(max_workers=10)

FutureSession will use an existing session object if supplied:

from requests import session
from requests_futures.sessions import FuturesSession
my_session = session()
future_session = FuturesSession(session=my_session)

That's it. The api of requests.Session is preserved without any modifications beyond returning a Future rather than Response. As with all futures exceptions are shifted (thrown) to the future.result() call so try/except blocks should be moved there.

Tying extra information to the request/response

The most common piece of information needed is the URL of the request. This can be accessed without any extra steps using the request property of the response object.

from concurrent.futures import as_completed
from pprint import pprint
from requests_futures.sessions import FuturesSession

session = FuturesSession()

futures=[session.get(f'http://httpbin.org/get?{i}') for i in range(3)]

for future in as_completed(futures):
    resp = future.result()
    pprint({
        'url': resp.request.url,
        'content': resp.json(),
    })

There are situations in which you may want to tie additional information to a request/response. There are a number of ways to go about this, the simplest is to attach additional information to the future object itself.

from concurrent.futures import as_completed
from pprint import pprint
from requests_futures.sessions import FuturesSession

session = FuturesSession()

futures=[]
for i in range(3):
    future = session.get('http://httpbin.org/get')
    future.i = i
    futures.append(future)

for future in as_completed(futures):
    resp = future.result()
    pprint({
        'i': future.i,
        'content': resp.json(),
    })

Canceling queued requests (a.k.a cleaning up after yourself)

If you know that you won't be needing any additional responses from futures that haven't yet resolved, it's a good idea to cancel those requests. You can do this by using the session as a context manager:

from requests_futures.sessions import FuturesSession
with FuturesSession(max_workers=1) as session:
    future = session.get('https://httpbin.org/get')
    future2 = session.get('https://httpbin.org/delay/10')
    future3 = session.get('https://httpbin.org/delay/10')
    response = future.result()

In this example, the second or third request will be skipped, saving time and resources that would otherwise be wasted.

Iterating over a list of requests responses

Without preserving the requests order:

from concurrent.futures import as_completed
from requests_futures.sessions import FuturesSession
with FuturesSession() as session:
    futures = [session.get('https://httpbin.org/delay/{}'.format(i % 3)) for i in range(10)]
    for future in as_completed(futures):
        resp = future.result()
        print(resp.json()['url'])

Working in the Background

Additional processing can be done in the background using requests's hooks functionality. This can be useful for shifting work out of the foreground, for a simple example take json parsing.

from pprint import pprint
from requests_futures.sessions import FuturesSession

session = FuturesSession()

def response_hook(resp, *args, **kwargs):
    # parse the json storing the result on the response object
    resp.data = resp.json()

future = session.get('http://httpbin.org/get', hooks={
    'response': response_hook,
})
# do some other stuff, send some more requests while this one works
response = future.result()
print('response status {0}'.format(response.status_code))
# data will have been attached to the response object in the background
pprint(response.data)

Hooks can also be applied to the session.

from pprint import pprint
from requests_futures.sessions import FuturesSession

def response_hook(resp, *args, **kwargs):
    # parse the json storing the result on the response object
    resp.data = resp.json()

session = FuturesSession()
session.hooks['response'] = response_hook

future = session.get('http://httpbin.org/get')
# do some other stuff, send some more requests while this one works
response = future.result()
print('response status {0}'.format(response.status_code))
# data will have been attached to the response object in the background
pprint(response.data)   pprint(response.data)

A more advanced example that adds an elapsed property to all requests.

from pprint import pprint
from requests_futures.sessions import FuturesSession
from time import time


class ElapsedFuturesSession(FuturesSession):

    def request(self, method, url, hooks=None, *args, **kwargs):
        start = time()
        if hooks is None:
            hooks = {}

        def timing(r, *args, **kwargs):
            r.elapsed = time() - start

        try:
            if isinstance(hooks['response'], (list, tuple)):
                # needs to be first so we don't time other hooks execution
                hooks['response'].insert(0, timing)
            else:
                hooks['response'] = [timing, hooks['response']]
        except KeyError:
            hooks['response'] = timing

        return super(ElapsedFuturesSession, self) \
            .request(method, url, hooks=hooks, *args, **kwargs)



session = ElapsedFuturesSession()
future = session.get('http://httpbin.org/get')
# do some other stuff, send some more requests while this one works
response = future.result()
print('response status {0}'.format(response.status_code))
print('response elapsed {0}'.format(response.elapsed))

Using ProcessPoolExecutor

Similarly to ThreadPoolExecutor, it is possible to use an instance of ProcessPoolExecutor. As the name suggest, the requests will be executed concurrently in separate processes rather than threads.

from concurrent.futures import ProcessPoolExecutor
from requests_futures.sessions import FuturesSession

session = FuturesSession(executor=ProcessPoolExecutor(max_workers=10))
# ... use as before

Hint

Using the ProcessPoolExecutor is useful, in cases where memory usage per request is very high (large response) and cycling the interpreter is required to release memory back to OS.

A base requirement of using ProcessPoolExecutor is that the Session.request, FutureSession all be pickle-able.

This means that only Python 3.5 is fully supported, while Python versions 3.4 and above REQUIRE an existing requests.Session instance to be passed when initializing FutureSession. Python 2.X and < 3.4 are currently not supported.

# Using python 3.4
from concurrent.futures import ProcessPoolExecutor
from requests import Session
from requests_futures.sessions import FuturesSession

session = FuturesSession(executor=ProcessPoolExecutor(max_workers=10),
                         session=Session())
# ... use as before

In case pickling fails, an exception is raised pointing to this documentation.

# Using python 2.7
from concurrent.futures import ProcessPoolExecutor
from requests import Session
from requests_futures.sessions import FuturesSession

session = FuturesSession(executor=ProcessPoolExecutor(max_workers=10),
                         session=Session())
Traceback (most recent call last):
...
RuntimeError: Cannot pickle function. Refer to documentation: https://github.com/ross/requests-futures/#using-processpoolexecutor

Important

  • Python >= 3.4 required
  • A session instance is required when using Python < 3.5
  • If sub-classing FuturesSession it must be importable (module global)

Installation

pip install requests-futures

More Repositories

1

memcache-debug-panel

A django-debug-toolbar panel for memcached debugging information (memcache and pylibmc supported)
Python
50
star
2

performant-pagination

High Performance Python Django pagination
Python
36
star
3

zfs-periodic

Scripts to take zfs snapshots with periodic on FreeBSD
Shell
31
star
4

python-asynchttp

A simple httplib2 based asynchronous HTTP library for python
Python
27
star
5

XOSplash

Seamless splash video support for iOS (iPhone and iPad)
Objective-C
26
star
6

django-thread

Helpers for using Django from threads
Python
7
star
7

perlang

Erlang-like functionality for the perl programming language
Perl
5
star
8

botie

Slack Bot & Slash Command development framework
Python
5
star
9

python-lint8

A pep8 (the command) inspired python lint/best practices tool.
Python
5
star
10

simone

todo
Python
4
star
11

django-html5bp

Basic Html5 Boiler Plate app
Python
4
star
12

webassets-dynamic

helper module to allow you to use webassets with django.views.static.serve
Python
3
star
13

utransit-old

Proof-of-concept/API demo for a universal multi-region transit data api including scheduling and real-time data.
Python
3
star
14

advent-of-code-2022

Python
2
star
15

django-multifile

Re-usable Django app for accepting multiple files in form fields.
JavaScript
2
star
16

webassets-slimmer

webassets filters for slimmer (css & js)
Python
2
star
17

sonos-youtube

Trivial HTTP server to allow Sonos to play YouTube live streams
Python
2
star
18

google-sheets-tax-script

Quick and dirty (probably wrong) tax calculations for google sheets
JavaScript
2
star
19

haproxy-mapper

Tools for generating HAProxy Maps from various sources
Go
2
star
20

ripebot

RIPE Bot
Python
2
star
21

8x8

Raspberry PI Code for an 8x8 LED Matrix
Python
2
star
22

statsd-demo

Demo app for python statsd and django_statsd
Python
2
star
23

s3-perf

Python
1
star
24

trivial-app

HTML
1
star
25

nano-mvc

A tiny, simple, and powerful JavaScript MVC framework (it's all in how you apply it.)
JavaScript
1
star
26

evRobot

Event Driven Robot (Roomba) Control in Python
Python
1
star
27

djtmpl

django template parsing in javascript (prototype/wip/experimental)
JavaScript
1
star
28

acts_as_obfuscated

Add support to ActiveRecord::Base for id obfuscation, i.e. instead of .../users/show/1 you get .../users/show/cXdbpx completely transparently
Ruby
1
star