• This repository has been archived on 01/Aug/2024
  • Stars
    star
    519
  • Rank 85,261 (Top 2 %)
  • Language
    JavaScript
  • License
    The Unlicense
  • Created over 10 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A bridge between node and PhantomJS

phridge

A bridge between node and PhantomJS.

Dependency Status Build Status Coverage Status

Working with PhantomJS in node is a bit cumbersome since you need to spawn a new PhantomJS process for every single task. However, spawning a new process is quite expensive and thus can slow down your application significantly.

phridge provides an api to easily

  • spawn new PhantomJS processes
  • run functions with arguments inside PhantomJS
  • return results from PhantomJS to node
  • manage long-running PhantomJS instances

Unlike other node-PhantomJS bridges phridge provides a way to run code directly inside PhantomJS instead of turning every call and assignment into an async operation.

phridge uses PhantomJS' stdin and stdout for inter-process communication. It stringifies the given function, passes it to PhantomJS via stdin, executes it in the PhantomJS environment and passes back the results via stdout. Thus you can write your PhantomJS scripts inside your node modules in a clean and synchronous way.

Instead of ...

phantom.addCookie("cookie_name", "cookie_value", "localhost", function () {
    phantom.createPage(function (page) {
        page.set("customHeaders.Referer", "http://google.com", function () {
            page.set(
                "settings.userAgent",
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5)",
                function () {
                    page.open("http://localhost:9901/cookie", function (status) {
                        page.evaluate(function (selector) {
                            return document.querySelector(selector).innerText;
                        }, function (text) {
                            console.log("The element contains the following text: "+ text)
                        }, "h1");
                    });
                }
            );
        });
    });
});

... you can write ...

// node
phantom.run("h1", function (selector, resolve) {
    // this code runs inside PhantomJS

    phantom.addCookie("cookie_name", "cookie_value", "localhost");

    var page = webpage.create();
    page.customHeaders = {
        Referer: "http://google.com"
    };
    page.settings = {
        userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5)"
    };
    page.open("http://www.google.com", function () {
        var text = page.evaluate(function (selector) {
            return document.querySelector(selector).innerText;
        }, selector);

        // resolve the promise and pass 'text' back to node 
        resolve(text);
    });
}).then(function (text) {
    // inside node again
    console.log("The element contains the following text: " + text);
});

Please note that the phantom-object provided by phridge is completely different to the phantom-object inside PhantomJS. So is the page-object. Check out the api for further information.


Installation

npm install phridge


Examples

Spawn a new PhantomJS process

phridge.spawn({
    proxyAuth: "john:1234",
    loadImages: false,
    // passing CLI-style options does also work
    "--remote-debugger-port": 8888
}).then(function (phantom) {
    // phantom is now a reference to a specific PhantomJS process
});

phridge.spawn() takes an object which will be passed as config to PhantomJS. Check out their documentation for a detailed overview of options. CLI-style options are added as they are, so be sure to escape the space character.

Please note: There are known issues of PhantomJS that some config options are only supported in CLI-style.

Run any function inside PhantomJS

phantom.run(function () {
    console.log("Hi from PhantomJS");
});

phridge stringifies the given function, sends it to PhantomJS and evals it again. Hence you can't use scope variables:

var someVar = "hi";

phantom.run(function () {
    console.log(someVar); // throws a ReferenceError
});

Passing arguments

You can also pass arguments to the PhantomJS process:

phantom.run("hi", 2, {}, function (string, number, object) {
    console.log(string, number, object); // 'hi', 2, [object Object]
});

Arguments are stringified by JSON.stringify(), so be sure to use JSON-valid objects.

Returning results

The given function can run sync and async. However, the run() method itself will always run async as it needs to wait for the process to respond.

Sync

phantom.run(function () {
    return Math.PI;
}).then(function (pi) {
    console.log(pi === Math.PI); // true
});

Async

phantom.run(function (resolve) {
    setTimeout(function () {
        resolve("after 500 ms");
    }, 500);
}).then(function (msg) {
    console.log(msg); // 'after 500 ms'
});

Results are also stringified by JSON.stringify(), so returning application objects with functions won't work.

phantom.run(function () {
    ...
    // doesn't work because page is not a JSON-valid object
    return page;
});

Returning errors

Errors can be returned by using the throw keyword or by calling the reject function. Both ways will reject the promise returned by run().

Sync

phantom.run(function () {
    throw new Error("An unknown error occured");
}).catch(function (err) {
    console.log(err); // 'An unknown error occured'
});

Async

phantom.run(function (resolve, reject) {
    setTimeout(function () {
        reject(new Error("An unknown error occured"));
    }, 500);
}).catch(function (err) {
    console.log(err); // 'An unknown error occured'
});

Async methods with arguments

resolve and reject are just appended to the regular arguments:

phantom.run(1, 2, 3, function (one, two, three, resolve, reject) {

});

Persisting states inside PhantomJS

Since the function passed to phantom.run() can't declare variables in the global scope, it is impossible to maintain state in PhantomJS. That's why phantom.run() calls all functions on the same context object. Thus you can easily store state variables.

phantom.run(function () {
    this.message = "Hello from the first call";
}).then(function () {
    phantom.run(function () {
        console.log(this.message); // 'Hello from the first call'
    });
});

For further convenience all PhantomJS modules are already available in the global scope.

phantom.run(function () {
    console.log(webpage);           // [object Object]
    console.log(system);            // [object Object]
    console.log(fs);                // [object Object]
    console.log(webserver);         // [object Object]
    console.log(child_process);     // [object Object]
});

Working in a page context

Most of the time its more useful to work in a specific webpage context. This is done by creating a Page via phantom.createPage() which calls internally require("webpage").create(). The returned page wrapper will then execute all functions bound to a PhantomJS webpage instance.

var page = phantom.createPage();

page.run(function (resolve, reject) {
    // `this` is now a webpage instance
    this.open("http://example.com", function (status) {
        if (status !== "success") {
            return reject(new Error("Cannot load " + this.url));
        }
        resolve();
    });
});

And for the busy ones: You can just call phantom.openPage(url) which is basically the same as above:

phantom.openPage("http://example.com").then(function (page) {
    console.log("Example loaded");
});

Cleaning up

If you don't need a particular page anymore, just call:

page.dispose().then(function () {
    console.log("page disposed");
});

This will clean up all page references inside PhantomJS.

If you don't need the whole process anymore call

phantom.dispose().then(function () {
    console.log("process terminated");
});

which will terminate the process cleanly by calling phantom.exit(0) internally. You don't need to dispose all pages manuallly when you call phantom.dispose().

However, calling

phridge.disposeAll().then(function () {
    console.log("All processes created by phridge.spawn() have been terminated");
});

will terminate all processes.

I strongly recommend to call phridge.disposeAll() when the node process exits as this is the only way to ensure that all child processes terminate as well. Since disposeAll() is async it is not safe to call it on process.on("exit"). It is better to call it on SIGINT, SIGTERM and within your regular exit flow.


API

phridge

.spawn(config?): Promise โ†’ Phantom

Spawns a new PhantomJS process with the given config. Read the PhantomJS documentation for all available config options. Use camelCase style for option names. The promise will be fulfilled with an instance of Phantom.

.disposeAll(): Promise

Terminates all PhantomJS processes that have been spawned. The promise will be fulfilled when all child processes emitted an exit-event.

.config.stdout: Stream = process.stdout

Destination stream where PhantomJS' clean stdout will be piped to. Set it null if you don't want it. Changing the value does not affect processes that have already been spawned.

.config.stderr: Stream = process.stderr

Destination stream where PhantomJS' stderr will be piped to. Set it null if you don't want it. Changing the value does not affect processes that have already been spawned.


Phantom.prototype

.childProcess: ChildProcess

A reference to the ChildProcess-instance.

.childProcess.cleanStdout: ReadableStream

phridge extends the ChildProcess-instance by a new stream called cleanStdout. This stream is piped to process.stdout by default. It provides all data not dedicated to phridge. Streaming data is considered to be dedicated to phridge when the new line is preceded by the classifier string "message to node: ".

.run(args..., fn): Promise โ†’ *

Stringifies fn, sends it to PhantomJS and executes it there again. args... are stringified using JSON.stringify() and passed to fn again. fn may simply return a result or throw an error or call resolve() or reject() respectively if it is asynchronous. phridge compares fn.length with the given number of arguments to determine whether fn is sync or async. The returned promise will be resolved with the result or rejected with the error.

.createPage(): Page

Creates a wrapper to execute code in the context of a specific PhantomJS webpage.

.openPage(url): Promise โ†’ Page

Calls phantom.createPage(), then page.open(url, cb) inside PhantomJS and resolves when cb is called. If the returned status is not "success" the promise will be rejected.

.dispose(): Promise

Calls phantom.exit(0) inside PhantomJS and resolves when the child process emits an exit-event.

Events

unexpectedExit

Will be emitted when PhantomJS exited without a call to phantom.dispose() or one of its std streams emitted an error event. This event may be fired on some OS when the process group receives a SIGINT or SIGTERM (see #35).

When an unexpectedExit event is encountered, the phantom instance will be unusable and therefore automatically disposed. Usually you don't need to listen for this event.


Page.prototype

.phantom: Phantom

A reference to the parent Phantom instance.

.run(args..., fn): Promise โ†’ *

Calls fn on the context of a PhantomJS page object. See phantom.run() for further information.

.dispose(): Promise

Cleans up this page instance by calling page.close()


Contributing

From opening a bug report to creating a pull request: every contribution is appreciated and welcome. If you're planing to implement a new feature or change the api please create an issue first. This way we can ensure that your precious work is not in vain.

All pull requests should have 100% test coverage (with notable exceptions) and need to pass all tests.

  • Call npm test to run the unit tests
  • Call npm run coverage to check the test coverage (using istanbul)

License

Unlicense

Sponsors

More Repositories

1

updtr

Update outdated npm modules with zero painโ„ข
JavaScript
2,224
star
2

parse-domain

Splits a hostname into subdomains, domain and (effective) top-level domains.
TypeScript
505
star
3

markdown-loader

markdown loader for webpack
JavaScript
376
star
4

extract-loader

webpack loader to extract HTML and CSS from the bundle
JavaScript
317
star
5

modernizr-loader

Get your modernizr build bundled with webpack
JavaScript
163
star
6

scriptlint

an enforceable script naming standard for package.json
TypeScript
132
star
7

angular-expressions

Angular expressions as standalone module
JavaScript
94
star
8

unzip-crx

Unzip chrome extension files
JavaScript
52
star
9

nof5

A tool which runs unit tests (based on mocha) if a file has changed on the server in a browser
JavaScript
52
star
10

inspect-loader

Webpack loader designed for loader testing and debugging. Calls a function with the received input.
JavaScript
50
star
11

wasm-image

an image manipulation wrapper, JS API for Rust `image`
JavaScript
41
star
12

unicons

Cross-platform unicode icon toolkit
JavaScript
37
star
13

xunit-file

Basically the same reporter as mocha's xunit reporter, but writes the output in a file.
JavaScript
37
star
14

talks

Everything @peerigon talks about...
JavaScript
35
star
15

scrapegoat

Fetches calendar/event objects from a CalDav server
JavaScript
31
star
16

alamid

Framework for RESTful JavaScript web applications that run both on the server- and clientside.
JavaScript
23
star
17

clockodo

Unofficial JavaScript/TypeScript SDK for Clockodo
TypeScript
20
star
18

erroz

Streamlined errors with descriptive error messages through metadata and error codes
TypeScript
18
star
19

socket.io-session-middleware

share connect/express sessions with socket.io
JavaScript
18
star
20

JavaScript.js

Compiles JavaScript to JavaScript - and that's itโ„ข
JavaScript
15
star
21

link-package

No more require("./../../../../../../../../some/other/file.js")
JavaScript
15
star
22

alamid-schema

Extendable mongoose-like schemas for node.js and the browser
JavaScript
14
star
23

dynamic-config

Loads configuration files depending on the given env
JavaScript
14
star
24

legacy-loader

Webpack loader that prevents scripts from extending the window object.
JavaScript
12
star
25

eslint-config-peerigon

Peerigon coding rules as eslint config
JavaScript
11
star
26

webpack-universal-dev-server

JavaScript
10
star
27

sevdesk

๐Ÿ’ต Unofficial JavaScript SDK for sevdesk.com
TypeScript
9
star
28

piwik-wrap

A Promise-based wrapper for the Piwik JavaScript Tracking Client providing an enjoyable API
JavaScript
8
star
29

servus.js

Servus as a Service (SaaS)
JavaScript
8
star
30

v8-docs

V8 API Reference Guide generated from the header files
7
star
31

batch-replace

Perform multiple str.replace() with one operation.
JavaScript
7
star
32

value

Convenient high-performance type-checking for JavaScript
JavaScript
6
star
33

github-ssh-keys

Fetch SSH keys from GitHub for given usernames. Optionally save them to `authorized_keys`.
JavaScript
6
star
34

slogan.js

Slogan as a service (SaaS)
JavaScript
6
star
35

svg-spinners

Scalabe loading indicators
JavaScript
6
star
36

alium

Save CLI commands as aliases on a directory basis
TypeScript
5
star
37

mattermost-theme

4
star
38

alamid-class

Easy prototype inheritance.
JavaScript
3
star
39

uberschrift

Magic heading levels for React
TypeScript
3
star
40

alamid-api

Abstracting http/websocket requests
JavaScript
3
star
41

nodeclass

Smart classes for node.js
JavaScript
3
star
42

webpack-demo

This is a small example repo that can be used to demonstrate some webpack features in a "live coding" fashion.
JavaScript
3
star
43

fshelpers

Some helpers for more convenient file system operations in nodeJS.
JavaScript
2
star
44

peerigon-tito-api

_Very simple_ layer to the ti.to api.
JavaScript
2
star
45

metaclass

Provides some basic classes to describe the structure of a software project.
JavaScript
2
star
46

alamid-plugin

Monkey-patch everythingโ„ข
JavaScript
2
star
47

bananabomb

Provides some REST-API-clients for Twitter, Facebook, etc. in node.js
JavaScript
2
star
48

node2browser

DEPRECATED: Use webpack :)
JavaScript
2
star
49

alamid-api-client

Abstracts transports to a remote API
JavaScript
2
star
50

create-package

A template for a TypeScript package
TypeScript
1
star
51

svstat

a node.js wrapper for daemontools svstat
JavaScript
1
star
52

npm-stats

stats of our npm modules
JavaScript
1
star
53

alamid-sorted-array

Turns an array into a sorted array
JavaScript
1
star
54

react-and-caviar

hackathon!
JavaScript
1
star
55

hsa

1
star
56

turbo-pnpm-prune-git-dependency-issue

Dockerfile
1
star
57

email-i18n

DEPRECATED: E-mail precompiler with support for i18n and optimization via premailer
JavaScript
1
star
58

sharelock

Awesome encryption for social networks
JavaScript
1
star
59

alamid-view

Encapsulates dom nodes as a re-usable component
JavaScript
1
star
60

meetup-bridge

Bridging the meetup API to show events on our webpage - https://peerigon.com
JavaScript
1
star
61

telemetrydeck-vue

A library for using TelemetryDeck in your Vue 3 app
TypeScript
1
star