• Stars
    star
    825
  • Rank 54,915 (Top 2 %)
  • Language
    JavaScript
  • License
    Apache License 2.0
  • Created almost 7 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining.

Programmable HTTP proxy server for Node.js

npm version Build Status

Node.js implementation of a proxy server (think Squid) with support for SSL, authentication, upstream proxy chaining, custom HTTP responses and measuring traffic statistics. The authentication and proxy chaining configuration is defined in code and can be dynamic. Note that the proxy server only supports Basic authentication (see Proxy-Authorization for details).

For example, this package is useful if you need to use proxies with authentication in the headless Chrome web browser, because it doesn't accept proxy URLs such as http://username:[email protected]:8080. With this library, you can set up a local proxy server without any password that will forward requests to the upstream proxy with password. The package is used for this exact purpose by the Apify web scraping platform.

To learn more about the rationale behind this package, read How to make headless Chrome and Puppeteer use a proxy server with authentication.

Run a simple HTTP/HTTPS proxy server

const ProxyChain = require('proxy-chain');

const server = new ProxyChain.Server({ port: 8000 });

server.listen(() => {
    console.log(`Proxy server is listening on port ${8000}`);
});

Run a HTTP/HTTPS proxy server with credentials and upstream proxy

const ProxyChain = require('proxy-chain');

const server = new ProxyChain.Server({
    // Port where the server will listen. By default 8000.
    port: 8000,

    // Enables verbose logging
    verbose: true,

    // Custom user-defined function to authenticate incoming proxy requests,
    // and optionally provide the URL to chained upstream proxy.
    // The function must return an object (or promise resolving to the object) with the following signature:
    // { requestAuthentication: boolean, upstreamProxyUrl: string, failMsg?: string, customTag?: unknown }
    // If the function is not defined or is null, the server runs in simple mode.
    // Note that the function takes a single argument with the following properties:
    // * request      - An instance of http.IncomingMessage class with information about the client request
    //                  (which is either HTTP CONNECT for SSL protocol, or other HTTP request)
    // * username     - Username parsed from the Proxy-Authorization header. Might be empty string.
    // * password     - Password parsed from the Proxy-Authorization header. Might be empty string.
    // * hostname     - Hostname of the target server
    // * port         - Port of the target server
    // * isHttp       - If true, this is a HTTP request, otherwise it's a HTTP CONNECT tunnel for SSL
    //                  or other protocols
    // * connectionId - Unique ID of the HTTP connection. It can be used to obtain traffic statistics.
    prepareRequestFunction: ({ request, username, password, hostname, port, isHttp, connectionId }) => {
        return {
            // If set to true, the client is sent HTTP 407 resposne with the Proxy-Authenticate header set,
            // requiring Basic authentication. Here you can verify user credentials.
            requestAuthentication: username !== 'bob' || password !== 'TopSecret',

            // Sets up an upstream HTTP proxy to which all the requests are forwarded.
            // If null, the proxy works in direct mode, i.e. the connection is forwarded directly
            // to the target server. This field is ignored if "requestAuthentication" is true.
            // The username and password must be URI-encoded.
            upstreamProxyUrl: `http://username:[email protected]:3128`,

            // If "requestAuthentication" is true, you can use the following property
            // to define a custom error message to return to the client instead of the default "Proxy credentials required"
            failMsg: 'Bad username or password, please try again.',

            // Optional custom tag that will be passed back via
            // `tunnelConnectResponded` or `tunnelConnectFailed` events
            // Can be used to pass information between proxy-chain
            // and any external code or application using it
            customTag: { userId: '123' },
        };
    },
});

server.listen(() => {
  console.log(`Proxy server is listening on port ${server.port}`);
});

// Emitted when HTTP connection is closed
server.on('connectionClosed', ({ connectionId, stats }) => {
  console.log(`Connection ${connectionId} closed`);
  console.dir(stats);
});

// Emitted when HTTP request fails
server.on('requestFailed', ({ request, error }) => {
  console.log(`Request ${request.url} failed`);
  console.error(error);
});

A different approach to 502 Bad Gateway

502 status code is not comprehensive enough. Therefore, the server may respond with 590-599 instead:

590 Non Successful

Upstream responded with non-200 status code.

591 RESERVED

This status code is reserved for further use.

592 Status Code Out Of Range

Upstream respondend with status code different than 100-999.

593 Not Found

DNS lookup failed - EAI_NODATA or EAI_NONAME.

594 Connection Refused

Upstream refused connection.

595 Connection Reset

Connection reset due to loss of connection or timeout.

596 Broken Pipe

Trying to write on a closed socket.

597 Auth Failed

Incorrect upstream credentials.

598 RESERVED

This status code is reserved for further use.

599 Upstream Error

Generic upstream error.


590 and 592 indicate an issue on the upstream side.
593 indicates an incorrect proxy-chain configuration.
594, 595 and 596 may occur due to connection loss.
597 indicates incorrect upstream credentials.
599 is a generic error, where the above is not applicable.

Custom error responses

To return a custom HTTP response to indicate an error to the client, you can throw the RequestError from inside of the prepareRequestFunction function. The class constructor has the following parameters: RequestError(body, statusCode, headers). By default, the response will have Content-Type: text/plain; charset=utf-8.

const ProxyChain = require('proxy-chain');

const server = new ProxyChain.Server({
    prepareRequestFunction: ({ request, username, password, hostname, port, isHttp, connectionId }) => {
        if (username !== 'bob') {
           throw new ProxyChain.RequestError('Only Bob can use this proxy!', 400);
        }
    },
});

Measuring traffic statistics

To get traffic statistics for a certain HTTP connection, you can use:

const stats = server.getConnectionStats(connectionId);
console.dir(stats);

The resulting object looks like:

{
    // Number of bytes sent to client
    srcTxBytes: Number,
    // Number of bytes received from client
    srcRxBytes: Number,
    // Number of bytes sent to target server (proxy or website)
    trgTxBytes: Number,
    // Number of bytes received from target server (proxy or website)
    trgRxBytes: Number,
}

If the underlying sockets were closed, the corresponding values will be null, rather than 0.

Custom responses

Custom responses allow you to override the response to a HTTP requests to the proxy, without contacting any target host. For example, this is useful if you want to provide a HTTP proxy-style interface to an external API or respond with some custom page to certain requests. Note that this feature is only available for HTTP connections. That's because HTTPS connections cannot be intercepted without access to the target host's private key.

To provide a custom response, the result of the prepareRequestFunction function must define the customResponseFunction property, which contains a function that generates the custom response. The function is passed no parameters and it must return an object (or a promise resolving to an object) with the following properties:

{
  // Optional HTTP status code of the response. By default it is 200.
  statusCode: 200,

  // Optional HTTP headers of the response
  headers: {
    'X-My-Header': 'bla bla',
  }

  // Optional string with the body of the HTTP response
  body: 'My custom response',

  // Optional encoding of the body. If not provided, defaults to 'UTF-8'
  encoding: 'UTF-8',
}

Here is a simple example:

const ProxyChain = require('proxy-chain');

const server = new ProxyChain.Server({
    port: 8000,
    prepareRequestFunction: ({ request, username, password, hostname, port, isHttp }) => {
        return {
            customResponseFunction: () => {
                return {
                    statusCode: 200,
                    body: `My custom response to ${request.url}`,
                };
            },
        };
    },
});

server.listen(() => {
  console.log(`Proxy server is listening on port ${server.port}`);
});

Routing CONNECT to another HTTP server

While customResponseFunction enables custom handling methods such as GET and POST, many HTTP clients rely on CONNECT tunnels. It's possible to route those requests differently using the customConnectServer option. It accepts an instance of Node.js HTTP server.

const http = require('http');
const ProxyChain = require('proxy-chain');

const exampleServer = http.createServer((request, response) => {
    response.end('Hello from a custom server!');
});

const server = new ProxyChain.Server({
    port: 8000,
    prepareRequestFunction: ({ request, username, password, hostname, port, isHttp }) => {
        if (request.url.toLowerCase() === 'example.com:80') {
            return {
                customConnectServer: exampleServer,
            };
        }

        return {};
    },
});

server.listen(() => {
  console.log(`Proxy server is listening on port ${server.port}`);
});

In the example above, all CONNECT tunnels to example.com are overridden. This is an unsecure server, so it accepts only http: requests.

In order to intercept https: requests, https.createServer should be used instead, along with a self signed certificate.

const https = require('https');
const fs = require('fs');
const key = fs.readFileSync('./test/ssl.key');
const cert = fs.readFileSync('./test/ssl.crt');

const exampleServer = https.createServer({
    key,
    cert,
}, (request, response) => {
    response.end('Hello from a custom server!');
});
-if (request.url.toLowerCase() === 'example.com:80') {
+if (request.url.toLowerCase() === 'example.com:443') {

Closing the server

To shut down the proxy server, call the close([destroyConnections], [callback]) function. For example:

server.close(true, () => {
  console.log('Proxy server was closed.');
});

The closeConnections parameter indicates whether pending proxy connections should be forcibly closed. If it's false, the function will wait until all connections are closed, which can take a long time. If the callback parameter is omitted, the function returns a promise.

Accessing the CONNECT response headers for proxy tunneling

Some upstream proxy providers might include valuable debugging information in the CONNECT response headers when establishing the proxy tunnel, for they may not modify future data in the tunneled connection.

The proxy server would emit a tunnelConnectResponded event for exposing such information, where the parameter types of the event callback are described in Node.js's documentation. Example:

server.on('tunnelConnectResponded', ({ proxyChainId, response, socket, head, customTag }) => {
    console.log(`CONNECT response headers received: ${response.headers}`);
});

Alternatively a helper function may be used:

listenConnectAnonymizedProxy(anonymizedProxyUrl, ({ response, socket, head }) => {
    console.log(`CONNECT response headers received: ${response.headers}`);
});

You can also listen to CONNECT requests that receive response with status code different from 200. The proxy server would emit a tunnelConnectFailed event.

server.on('tunnelConnectFailed', ({ proxyChainId, response, socket, head, customTag }) => {
    console.log(`CONNECT response failed with status code: ${response.statusCode}`);
});

Helper functions

The package also provides several utility functions.

anonymizeProxy({ url, port }, callback)

Parses and validates a HTTP proxy URL. If the proxy requires authentication, then the function starts an open local proxy server that forwards to the proxy. The port is chosen randomly.

The function takes an optional callback that receives the anonymous proxy URL. If no callback is supplied, the function returns a promise that resolves to a String with anonymous proxy URL or the original URL if it was already anonymous.

The following example shows how you can use a proxy with authentication from headless Chrome and Puppeteer. For details, read this blog post.

const puppeteer = require('puppeteer');
const proxyChain = require('proxy-chain');

(async() => {
    const oldProxyUrl = 'http://bob:[email protected]:8000';
    const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);

    // Prints something like "http://127.0.0.1:45678"
    console.log(newProxyUrl);

    const browser = await puppeteer.launch({
        args: [`--proxy-server=${newProxyUrl}`],
    });

    // Do your magic here...
    const page = await browser.newPage();
    await page.goto('https://www.example.com');
    await page.screenshot({ path: 'example.png' });
    await browser.close();

    // Clean up
    await proxyChain.closeAnonymizedProxy(newProxyUrl, true);
})();

closeAnonymizedProxy(anonymizedProxyUrl, closeConnections, callback)

Closes anonymous proxy previously started by anonymizeProxy(). If proxy was not found or was already closed, the function has no effect and its result is false. Otherwise the result is true.

The closeConnections parameter indicates whether pending proxy connections are forcibly closed. If it's false, the function will wait until all connections are closed, which can take a long time.

The function takes an optional callback that receives the result Boolean from the function. If callback is not provided, the function returns a promise instead.

createTunnel(proxyUrl, targetHost, options, callback)

Creates a TCP tunnel to targetHost that goes through a HTTP proxy server specified by the proxyUrl parameter.

The optional options parameter is an object with the following properties:

  • port: Number - Enables specifying the local port to listen at. By default 0, which means a random port will be selected.
  • hostname: String - Local hostname to listen at. By default localhost.
  • verbose: Boolean - If true, the functions logs a lot. By default false.

The result of the function is a local endpoint in a form of hostname:port. All TCP connections made to the local endpoint will be tunneled through the proxy to the target host and port. For example, this is useful if you want to access a certain service from a specific IP address.

The tunnel should be eventually closed by calling the closeTunnel() function.

The createTunnel() function accepts an optional Node.js-style callback that receives the path to the local endpoint. If no callback is supplied, the function returns a promise that resolves to a String with the path to the local endpoint.

For more information, read this blog post.

Example:

const host = await createTunnel('http://bob:[email protected]:8000', 'service.example.com:356');
// Prints something like "localhost:56836"
console.log(host);

closeTunnel(tunnelString, closeConnections, callback)

Closes tunnel previously started by createTunnel(). The result value is false if the tunnel was not found or was already closed, otherwise it is true.

The closeConnections parameter indicates whether pending connections are forcibly closed. If it's false, the function will wait until all connections are closed, which can take a long time.

The function takes an optional callback that receives the result of the function. If the callback is not provided, the function returns a promise instead.

listenConnectAnonymizedProxy(anonymizedProxyUrl, tunnelConnectRespondedCallback)

Allows to configure a callback on the anonymized proxy URL for the CONNECT response headers. See the above section Accessing the CONNECT response headers for proxy tunneling for details.

redactUrl(url, passwordReplacement)

Takes a URL and hides the password from it. For example:

// Prints 'http://bob:<redacted>@example.com'
console.log(redactUrl('http://bob:[email protected]'));

More Repositories

1

crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
TypeScript
14,725
star
2

crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
Python
3,734
star
3

fingerprint-suite

Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
TypeScript
875
star
4

got-scraping

HTTP client made for scraping based on got.
TypeScript
490
star
5

actor-page-analyzer

Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSON-LD metadata, analyzes AJAX requests, etc.
JavaScript
147
star
6

apify-cli

Apify command-line interface helps you create, develop, build and run Apify actors, and manage the Apify cloud platform.
TypeScript
119
star
7

apify-sdk-js

Apify SDK monorepo
TypeScript
117
star
8

apify-sdk-python

The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, local storage emulation, and actor event handling.
Python
115
star
9

actor-scraper

House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.
JavaScript
115
star
10

browser-pool

A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
TypeScript
87
star
11

fingerprint-generator

Generates realistic browser fingerprints
TypeScript
67
star
12

apify-actor-docker

Base Docker images for Apify actors.
Dockerfile
67
star
13

apify-client-js

Apify API client for JavaScript / Node.js.
JavaScript
63
star
14

fingerprint-injector

Home of fingerprint injector.
TypeScript
63
star
15

header-generator

NodeJs package for generating browser-like headers.
TypeScript
63
star
16

covid-19

Open APIs with statistics about Covid-19
JavaScript
46
star
17

apify-client-python

Apify API client for Python
Python
43
star
18

apify-docs

This project is the home of Apify's documentation.
API Blueprint
24
star
19

actor-templates

This project is the 🏠 home of Apify actor template projects to help users quickly get started.
Python
24
star
20

xlsx-stream

JavaScript / Node.js library to stream data into an XLSX file
JavaScript
23
star
21

apify-ts

Crawlee dev repo
TypeScript
22
star
22

got-cjs

An action to release a CommonJS version of the popular library got, which is soon to be available only in an ESM format.
JavaScript
21
star
23

actor-web-automation-agent

This is the experimental version of Web Automation Agent. The agent uses natural language instructions to browse the web and extract data.
TypeScript
19
star
24

actor-content-checker

You can use this act to monitor any page's content and get a notification when content changes.
JavaScript
17
star
25

super-scraper

Generic REST API for scraping websites. Drop-in replacement for ScrapingBee, ScrapingAnt, and ScraperAPI services. And it is open-source!
TypeScript
16
star
26

devtools-server

Runs a simple server that allows you to connect to Chrome DevTools running on dynamic hosts, not only localhost.
JavaScript
15
star
27

actor-quick-start

Contains a boilerplate of an Apify actor to help you get started quickly build your own actors.
Dockerfile
15
star
28

apify-shared-js

Utilities and constants shared across Apify projects.
TypeScript
12
star
29

better-sqlite3-with-prebuilds

Better SQLite prebuild & publish action
10
star
30

chat-with-a-website

A simple app that lets you chat with a given website.
Python
9
star
31

actor-scrapy-executor

Apify actor to run web spiders written in Python in the Scrapy library
Python
9
star
32

apify-zapier-integration

Apify integration for Zapier
JavaScript
8
star
33

idcac

I Don't Care About Cookies extension compiled for use with Playwright/Puppeteer
JavaScript
8
star
34

homebrew-tap

A Homebrew tap for Apify tools
Ruby
7
star
35

workflows

Apify's reusable github workflows
6
star
36

actor-legacy-phantomjs-crawler

The actor implements the legacy Apify Crawler product. It uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of JavaScript code.
JavaScript
6
star
37

act-crawler-results-to-s3

Apify actor to upload crawler results to AWS S3.
JavaScript
6
star
38

actor-example-python

Example Apify Actor written in Python
Python
5
star
39

browser-headers-generator

Package generating randomized browser-like headers.
JavaScript
4
star
40

input-schema-editor-react

Apify input schema editor written in React.js
JavaScript
4
star
41

crawlee-parallel-scraping-example

An example repository showcasing how you can scrape in parallel using one request queue
TypeScript
4
star
42

act-crawl-url-list

Apify actor to crawl a list of URLs
JavaScript
4
star
43

actor-imagediff

Returns an image containing difference of two given images.
JavaScript
3
star
44

apify-web-covid-19

A list of public COVID-19 APIs to be rendered on https://apify.com/covid-19
JavaScript
3
star
45

actor-example-proxy-intercept-request

Example: Intercept requests from https connection using "Man in the middle" proxy solution.
JavaScript
3
star
46

apify-storage-local-js

Local emulation of the apify-client NPM package, which enables local use of Apify SDK.
TypeScript
3
star
47

actor-vector-database-integrations

Transfer data from Apify Actors to vector databases (Chroma, Milvus, Pinecone, PostgreSQL (PG-Vector), Qdrant, and Weaviate)
Python
3
star
48

aidevworld2023

How to get clean web data for chatbots and LLMs slides and supporting materials.
JavaScript
3
star
49

actor-example-php

Example of Apify actor using PHP
PHP
2
star
50

apify-php-tutorial

PHP
2
star
51

apify-eslint-config

Apify ESLint preset to be shared between projects
JavaScript
2
star
52

http-request

A HTTP request library for Node.js, with a common-sense API, support for Brotli compression and without bugs in "request" NPM package
JavaScript
2
star
53

slack-messages-action

It wraps up messages sending from Apify GitHub workflows into Slack.
TypeScript
2
star
54

scraping-tools-js

A library of utility functions that make scraping, data extraction and usage of headless browsers easier and faster.
JavaScript
2
star
55

actor-beautifulsoup-scraper

Python
2
star
56

apify-tsconfig

TypeScript configuration shared across projects in Apify.
Shell
1
star
57

generative-bayesian-network

JavaScript
1
star
58

waw-file-specification

Contains specification of the Web Automation Workflow (WAW) file.
1
star
59

playwright-test-actor

Source code for the Playwright Test public actor.
TypeScript
1
star
60

apify-sdk-v2

Snapshot of Apify SDK v2 + sdk.apify.com website. This project is no longer maintained. See the https://github.com/apify/apify-sdk-js repo instead!
JavaScript
1
star
61

actor-algolia-website-indexer

Apify actor that crawls website and indexes selected web pages to Algolia index. It's used to power the search on https://help.apify.com
JavaScript
1
star
62

apify-eslint-config-ts

Typescript ESLint configuration shared across projects in Apify.
JavaScript
1
star
63

actor-proxy-test

JavaScript
1
star
64

appmixer-components

Home of all the future Appmixer components on the Apify platform.
JavaScript
1
star
65

actor-example-secret-input

Example actor showcasing the secret input fields
Dockerfile
1
star
66

actor-scrapy-books-example

Example of Python Scrapy project. It scrapes book data from https://books.toscrape.com/.
Python
1
star
67

komparz

Special, yet insignificant actors
JavaScript
1
star
68

actor-crawler-cheerio

DEPRECATED: An actor that crawls websites and parses HTML pages using Cheerio library. Supports recursive crawling as well as URL lists.
JavaScript
1
star
69

actor-crawler-puppeteer

DEPRECATED: An Apify actor that enables crawling of websites using headless Chrome and Puppeteer. The actor is highly customizable and supports recursive crawling of websites as well as lists of URLs.
JavaScript
1
star
70

actor-monorepo-example

An example repository with multiple Apify Actors sharing code between each other.
JavaScript
1
star
71

apify-haystack

The official integration for Apify and Haystack 2.0
Python
1
star
72

openapi

An OpenAPI specification for the Apify API.
JavaScript
1
star
73

scrapy-migrator

A standalone POC script for wrapping Scrapy projects with Apify middleware.
Python
1
star