• Stars
    star
    725
  • Rank 62,504 (Top 2 %)
  • Language
    JavaScript
  • Created about 14 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Programmable spidering of web sites with node.js and jQuery

Spider -- Programmable spidering of web sites with node.js and jQuery

Install

From source:

  git clone git://github.com/mikeal/spider.git 
  cd spider
  npm link ../spider

(How to use the) API

Creating a Spider

  var spider = require('spider');
  var s = spider();

spider(options)

The options object can have the following fields:

  • maxSockets - Integer containing the maximum amount of sockets in the pool. Defaults to 4.
  • userAgent - The User Agent String to be sent to the remote server along with our request. Defaults to Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.41 Safari/534.7 (firefox userAgent String).
  • cache - The Cache object to be used as cache. Defaults to NoCache, see code for implementation details for a new Cache object.
  • pool - A hash object containing the agents for the requests. If omitted the requests will use the global pool which is set to maxSockets.

Adding a Route Handler

spider.route(hosts, pattern, cb)

Where the params are the following :

  • hosts - A string -- or an array of string -- representing the host part of the targeted URL(s).
  • pattern - The pattern against which spider tries to match the remaining (pathname + search + hash) of the URL(s).
  • cb - A function of the form function(window, $) where
    • this - Will be a variable referencing the Routes.match return object/value with some other goodies added from spider. For more info see https://github.com/aaronblohowiak/routes.js
    • window - Will be a variable referencing the document's window.
    • $ - Will be the variable referencing the jQuery Object.

Queuing an URL for spider to fetch.

spider.get(url) where url is the url to fetch.

Extending / Replacing the MemoryCache

Currently the MemoryCache must provide the following methods:

  • get(url, cb) - Returns url's body field via the cb callback/continuation if it exists. Returns null otherwise.
    • cb - Must be of the form function(retval) {...}
  • getHeaders(url, cb) - Returns url's headers field via the cb callback/continuation if it exists. Returns null otherwise.
    • cb - Must be of the form function(retval) {...}
  • set(url, headers, body) - Sets/Saves url's headers and body in the cache.

Setting the verbose/log level

spider.log(level) - Where level is a string that can be any of "debug", "info", "error"

More Repositories

1

r2

HTTP client. Spiritual successor to request.
JavaScript
4,473
star
2

bent

Functional JS HTTP client (Node.js & Fetch) w/ async await
JavaScript
2,198
star
3

roll-call

πŸ“ž Free and reliable audio calls for everyone w/ browser p2p.
JavaScript
1,566
star
4

watch

Utilities for watching file trees in node.js
JavaScript
1,273
star
5

webtorrent-element

WebTorrent HTML element.
JavaScript
526
star
6

merge-release

Automatically release all merges to master on npm.
JavaScript
468
star
7

node.couchapp.js

Utility for writing couchapps.
JavaScript
407
star
8

tako

Functional web framework.
JavaScript
320
star
9

sequest

Simplified API for SSH and SFTP similar to request.
JavaScript
283
star
10

filed

Simplified file library.
JavaScript
282
star
11

daily

What’s happening in Open Source. Everyday πŸ€“
JavaScript
262
star
12

dropub

P2P publishing for everyone.
Vue
222
star
13

IPSQL

InterPlanetary SQL
JavaScript
208
star
14

znode

Bi-directional RPC through any stream.
JavaScript
205
star
15

reg

Native ESM Package Manager
JavaScript
175
star
16

node-utils

A collection of small, simple, and useful node packages.
JavaScript
173
star
17

markdown-element

HTML Element that renders markdown content.
JavaScript
137
star
18

dagdb

Syncable database built on IPLD
JavaScript
135
star
19

publish-to-github-action

A GitHub Action to push any local file changes, including new files, back to master
Dockerfile
109
star
20

browsercouch

CouchDB in the browser
JavaScript
98
star
21

rza

Create simple HTML elements
JavaScript
92
star
22

prolly-trees

Hash consistent search trees.
JavaScript
89
star
23

gza

Functional custom HTML elements.
JavaScript
88
star
24

self-care

Discussion repo for developers to share their self-care routines
88
star
25

shaolin

The easiest way to build Web Components.
JavaScript
87
star
26

snapkit

Capture screenshots of websites on command line or REST API.
JavaScript
83
star
27

jaws

Build HTTP applications as a cache.
JavaScript
77
star
28

killa-beez

🐝 We on a WebRTC Swarm!
JavaScript
73
star
29

response

πŸ„πŸ» Streaming and mutation API for HTTP responses.
JavaScript
73
star
30

node.couch.js

CouchDB + node.js == Crazy Delicious
JavaScript
70
star
31

couchup

A CouchDB implementation on top of levelup.
JavaScript
66
star
32

jspp

JavaScript Pre-Processor
JavaScript
60
star
33

vuejs-electron-demo

Simple Vue.js Electron demo
Vue
59
star
34

zcomponent

DEPRECATED: Use rza for a class based approach or gza for a functional approach.
JavaScript
57
star
35

vanilla

Compile-to-JavaScript language for people that write JavaScript
JavaScript
55
star
36

dkv

Decentralized key-value store running on IPFS
JavaScript
53
star
37

replicate

A customizable CouchDB replicator in node.js.
JavaScript
52
star
38

cappadonna

Headless browser testing for tap with coverage reporting.
JavaScript
48
star
39

tweetstream

node.js stream API for the twitter streaming HTTP API
JavaScript
47
star
40

way-of-code

Organizing my thoughts about the process and states of mind of programming
45
star
41

distjs

Distribute standalone WebComponents w/ npm.
JavaScript
41
star
42

ipjs

Universal JavaScript Build and Packaging
JavaScript
41
star
43

planet

A node.js planet (blog aggregator)
CSS
40
star
44

morestreams

Collection of useful stream objects.
JavaScript
40
star
45

funky

πŸ’ͺ🏿 Front-end view system using basic functional programming and template literals.
JavaScript
40
star
46

couchdb-pythonviews

Python view server for CouchDB.
JavaScript
40
star
47

bytesish

Cross-Platform Binary API
JavaScript
35
star
48

stud-proxy

Round Robin proxy/balancer for the stud TLS terminator
JavaScript
33
star
49

dbemitter

EventEmitters for remote database events
JavaScript
32
star
50

couchdb-wsgi

WSGI compliant handler for CouchDB external processes.
Python
32
star
51

getport

Find an open port to listen on.
JavaScript
32
star
52

redcouch

A client that stores data in both CouchDB and Redis.
JavaScript
32
star
53

sustainable-oss

Sustainable Open Source: The Book (Maybe)
31
star
54

nodeconf2013

NodeConf 2013 Planning and Sessions
JavaScript
31
star
55

SLEEP

Implementation of the SLEEP protocol.
JavaScript
29
star
56

bundle-size-action

Calculate the bundle size of your module. Useful for GitHub Actions.
JavaScript
28
star
57

occupy

Deployment for the 99%
JavaScript
28
star
58

ipfs-elements

HTML Elements for IPFS.
JavaScript
27
star
59

estest

ESM native testing across all JS platforms.
JavaScript
24
star
60

couchcache

CouchCache is a finely tuned caching HTTP proxy for CouchDB written in node.js
JavaScript
24
star
61

ZDAG

JSON/CBOR style format as a compressor
JavaScript
24
star
62

couch

Stupid simple Couch wrapper based on Request
JavaScript
24
star
63

framework

A framework for node.js (inspired by vapor.js)
JavaScript
23
star
64

sst

Super Simple Test Format
23
star
65

raindrop

git checkout of the labs.mozilla.com/raindrop hg repo
JavaScript
22
star
66

hostproxy

HTTP Proxy that searches for Host header and avoids any parsing
JavaScript
22
star
67

relaximation

Some relaxed automation
JavaScript
21
star
68

learnjs

Workshopper for learning JavaScript.
JavaScript
21
star
69

compretend

Web application building blocks power by ML.
JavaScript
20
star
70

buddhism.js

Buddhist concepts as JavaScript
20
star
71

webtouch

Validate that a web site and all its required resources are available.
JavaScript
20
star
72

couchie

Minimalist localStorage database API. Works well as a cache for CouchDB documents.
JavaScript
20
star
73

lucass

Lightweight Universal Content Addressable Storage Spec
JavaScript
18
star
74

peer-room

Ephemeral and secure peer-to-peer chat rooms.
JavaScript
17
star
75

requirein

A require() that works in a specified directory.
JavaScript
17
star
76

jsonfiles

Simple database as flat JSON files.
JavaScript
17
star
77

bong-bong

Open public chat service built for the web.
JavaScript
17
star
78

siofile

Stream a file to a socket.io client.
JavaScript
16
star
79

signal-exchange

WebRTC signal exchange using public keys and socket.io
JavaScript
16
star
80

mikeal.js

My blog code, node-couchapp and sammy.js code.
JavaScript
16
star
81

brrp

ESM bundle npm modules for browsers and nodejs
JavaScript
15
star
82

tasked

Background task state machines on top of CouchDB.
JavaScript
15
star
83

stoopid

Loggers are stupid and I'm resentful that I had to write this.
JavaScript
15
star
84

deferred

Deferred objects without Twisted
Python
15
star
85

methodman

Bidirectional rpc and streams for WebSockets and WebRTC.
JavaScript
14
star
86

iterp

Controlled parallelism w/ async iterables.
JavaScript
14
star
87

block-box

Universal hash addressed block container.
JavaScript
14
star
88

node.proxy.js

HTTP Proxy for node.js
JavaScript
14
star
89

go-stats

Crunching some data on the size of the Go ecosystem.
JavaScript
14
star
90

pushdb

A programmable database with document storage and unique indexing capabilities.
JavaScript
13
star
91

car-transaction

IPLD transaction as CAR buffer [for use in databases]
JavaScript
13
star
92

matrika

Next Generation Decentralized Database
JavaScript
13
star
93

level-mutex

Mutex read/write lock for levelup.
JavaScript
13
star
94

php-analytics

Scripts to pull down dependency analytics for PHP packages.
JavaScript
13
star
95

githubarchive

Streaming parsers for the github archive.
JavaScript
12
star
96

requestdb

A request wrapper that stores and retrieves responses from a leveldb cache.
JavaScript
12
star
97

brasstacks

A large scale results and graphing server using CouchDB.
JavaScript
12
star
98

waudio

Web Audio made sane.
JavaScript
12
star
99

logref

Logging for node.js
JavaScript
12
star
100

libp2p-simple

Pre-configured libp2p for browser and node.js.
JavaScript
12
star