• Stars
    star
    126
  • Rank 283,524 (Top 6 %)
  • Language
    JavaScript
  • Created about 12 years ago
  • Updated over 9 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Uses jQuery to return a structured JSON representation of a Wikipedia article.

WikiFetch

Author: @benjamincoe

Problem

For some NLP research I'm currently doing, I was interested in parsing structured information from Wikipedia articles.

I did not want to use a full-featured MediaWiki parser:

  • this would be heavy-handed, all I really wanted was: the text contents from articles, images, and links to other articles.
  • I wanted to be able to extend the approach to other websites, e.g., news sites.
  • I wanted to use a crawler-based approach, rather than downloading a massive dataset.

The Solution

WikiFetch Crawls a Wikipedia article using Node.js and jQuery. It returns a structured JSON-representation of the page:

	{
		"title": "Foobar Article",
		"links": {
			"Link_to_another_article: {
				"text": "Another article.", // the text that was linked.
				"title": "Another_article.", // title attribute <a/> tag.
				"occurrences": 1 // number of times this article was linked.
			}
		},
		"sections": {
			"Section Heading": {
				text: "text contents of section.",
				images: ["http://foobar.jpg"] // images occurring within this section.
			}
		}
	}
  • Links within sections are replaced with [[article name]], which will have a corresponding entry in links.

Usage

npm install wikifetch -g
wikifetch --article=Dog

More Repositories

1

c8

output coverage reports using Node.js' built in coverage
JavaScript
1,899
star
2

awesome-cross-platform-nodejs

πŸ‘¬ A curated list of awesome developer tools for writing cross-platform Node.js code
1,101
star
3

thumbd

Node.js/AWS/ImageMagick-based image thumbnailing service.
JavaScript
442
star
4

conventional-release-labels

Apply labels for automatically generated release notes, based on conventionalcommits.org
JavaScript
316
star
5

crapify

a proxy for simulating slow, spotty, HTTP connections
JavaScript
249
star
6

sandcastle

A simple and powerful sandbox for running untrusted JavaScript.
JavaScript
220
star
7

librarian-ansible

Port of librarian-chef, providing bundler functionality for Ansible roles.
Ruby
185
star
8

DoloresLabsTechTalk

Code for dolores labs tech talk.
JavaScript
176
star
9

which-cloud

given an ip address, return which cloud provider it belongs to (AWS, GCE, etc)
JavaScript
138
star
10

smtproutes

A simple, Sinatra inspired, SMTP routing server.
Python
137
star
11

secure-smtpd

Fork of Python's standard SMTP server. Adding support for various extensions to the protocol.
Python
128
star
12

karait

A ridiculously simple queuing system, with clients in various languages, built on top of MongoDB.
JavaScript
97
star
13

endtable

A ridiculously simple Object Mapper for Node running on top of CouchDB.
JavaScript
68
star
14

npm-tweets

Publishes tweets when libraries are updated on NPM.
JavaScript
64
star
15

routers-news

A crawler for various popular tech news sources. Read technology news from the comfort of your CLI.
JavaScript
56
star
16

unitgen

An API for building audio-unit-generators in Node.js
JavaScript
56
star
17

jDistiller

A page scraping DSL for extracting structured information from unstructured XHTML, built on Node.js and jQuery
JavaScript
49
star
18

top-npm-users

⭐ Generate a list of top npm users by based on monthly downloads.
JavaScript
46
star
19

travis-deploy-example

example of using Travis deployment to publish an npm module
JavaScript
40
star
20

onigurumajs

πŸ‘Ί a pure JavaScript port of the oniguruma regex engine
JavaScript
38
star
21

hl

πŸ‘– fancy pants syntax highlighting for the CLI
JavaScript
36
star
22

record-crate

index, organize, and search your music collection, DJ sick sets.
JavaScript
25
star
23

hide-secrets

for when you want to log an object but hide certain restricted fields, e.g., password
JavaScript
23
star
24

http2spy

test helpers for working with Node.js' http2 module
TypeScript
19
star
25

node-mocha-skeleton

A skeleton Node.JS project with Mocha for testing
JavaScript
18
star
26

puppeteer-to-istanbul-example

example of using puppeteer-to-istanbul to output istanbul reports from puppeteer coverage
JavaScript
17
star
27

nodemailer-mock-transport

mock-mailer for putting tests around services that use node-mailer
JavaScript
16
star
28

groundhogday

GroundhogDay lets you repeat an operation until you get it right.
Python
16
star
29

Adventures-in-Document-Thumbnailing

Some thoughts I have about creating thumbnails of common document types.
14
star
30

node-sexy-args

A sexy DSL for parsing the arguments passed into functions.
JavaScript
13
star
31

node-elasticsearch-proxy

A proxy to handle, and react to common error conditions raised by the elasticsearch http module.
JavaScript
10
star
32

mobius-js

MVC Web-Framework for JavaScript using Node.js and Express
JavaScript
10
star
33

renv

a command line interface for managing remote configuration, powered by etcd
JavaScript
10
star
34

toml-to-env

give me a toml configuration file, I'll give you export MY_ENV=foo
JavaScript
9
star
35

apidoc-md

Generate API documentation for your README from comments in your source-code
JavaScript
9
star
36

mongate

Client for Sleepy Mongoose that provides the same interface as Pymongo
Python
9
star
37

es2015-coverage

examples of minimal ES2015 projects, with testing and test coverage.
JavaScript
9
star
38

imaplib2

Fork of Piers Lauder's imaplib2 library for Python.
Python
9
star
39

dotgitignore

find the closest .gitignore file, parse it, and apply ignore rules
JavaScript
8
star
40

v8-coverage-merge

merges together two inspector-format coverage reports
JavaScript
7
star
41

elasticmapper

A damn simple mixin for integrating ActiveModel with ElasticSearch.
Ruby
7
star
42

AFHTTPRequestOperationManager-Timeout

Add timeout functionality to AFHTTPRequestOperationManager with Category
Objective-C
6
star
43

em-stretcher

EventMachine for Stretcher; a Fast, Elegant, ElasticSearch client
Ruby
6
star
44

any-path

:rage2: make the keys on an object path.sep agnostic.
JavaScript
5
star
45

gce-ips

fetch a list of Google Computer Engine's IP addresses using DNS lookup
JavaScript
5
star
46

webhooks

Elasticsearch WebHooks plugin.
Java
5
star
47

optional-dev-dependency

😎 try to install an optional development dependency, YOLO if you can't.
JavaScript
5
star
48

enronsearch

Perform searches on the Enron Email Dataset using ElasticSearch.
JavaScript
5
star
49

monkey-proxy

fork of tootallnate's proxy, adding hooks for modifying requests
JavaScript
4
star
50

istheshipstillstuck

is the ship still stuck?
JavaScript
4
star
51

nostrabot

The end of the world is nigh.
JavaScript
4
star
52

npm-typeahead

A tiny web-app that exposes typeahead search functionality for packages on http://www.npmjs.org.
JavaScript
4
star
53

path-buffer

Node.js' path module, but for buffers
JavaScript
3
star
54

mocoverage

a mocha reporter with coverage.
JavaScript
3
star
55

assertassert

:trollface: for when you can't decide on an assertion library
JavaScript
3
star
56

nanoleaf-travis

display Travis build statuses using nanoleaf aurora programmable lights: https://nanoleaf.me/en/
JavaScript
3
star
57

grpc-stress

stress tests for gRPC
JavaScript
3
star
58

private-module-heroku

example of using private modules on Heroku
JavaScript
3
star
59

node-coverage-debug

demonstrate issues we're bumping into instrumenting Node.js' test suite for coverage
HTML
3
star
60

leapiano

A midi piano built on top of the Leap Motion device.
JavaScript
3
star
61

example-redwood-blog

redwoodjs JAMstack framework on GCP
JavaScript
3
star
62

gh-commit-scan

scan commits in git repository
JavaScript
3
star
63

npme-auth-foo

Example of a foo npmE auth strategy.
JavaScript
3
star
64

yearincommits

GitHub 2014 commits leader board
JavaScript
3
star
65

soundcloud-backup

backup the meta information in your SoundCloud account (DJs you follow, tracks you've liked).
JavaScript
2
star
66

hackillinois-bot

experimenting with a yargs chat-bot for Hack Illinois
JavaScript
2
star
67

npm-coverage

An always-out-of-date ℒ️ coverage report for npm/npm.
HTML
2
star
68

plane-rpg

RPG I made en route from SFO to YYZ
JavaScript
2
star
69

jsdoc-region-tag

replace region tags with code samples
JavaScript
2
star
70

flaky.dev

track flaky tests over time, open issues on GitHub
JavaScript
2
star
71

citgm-harness

harness for running Canary in the Gold Mine builds
2
star
72

top-npm-users-server

server for top-npm-users addon
JavaScript
2
star
73

conventional-hud

Generate a CHANGELOG for GitHub repositories that follow conventionalcommits.org
JavaScript
2
star
74

node-micro-test

An asynchronous unit testing framework in under 40 lines of code.
1
star
75

leapdraw

Leapdraw allows you to Scribble on any website using the Leap Motion.
JavaScript
1
star
76

google-cloud-python

Testing changelog.json generation
Python
1
star
77

cambi.org

website in memorial to Cambi Evers-Everette (Coe)
CSS
1
star
78

test-node-path

application used to test require.main.filename on various platforms
JavaScript
1
star
79

cambiandben.com

the website for Cambi and Ben's wedding.
CSS
1
star
80

mtgdeck

build your Magic: The Gathering deck from the command-line
JavaScript
1
star
81

parseargs-yargs-parser

Shim that allows @pkgjs/parseargs to be used as drop in replacement for yargs-parser.
TypeScript
1
star
82

code-to-signal

map exit codes on Linux to named signals.
JavaScript
1
star
83

terminal-quiz

given a choices.txt, in a known location, prompts user for answer and writes to answer.txt
JavaScript
1
star
84

node-27566-bug

demonstration of wonky coverage reporting.
JavaScript
1
star
85

npm-tonic-app

tonic npm package addon
JavaScript
1
star
86

guesstimator

Estimates the performance of a distributed system based on a sample set of data.
Python
1
star
87

minimal-source-map

minimal implementation of SourceMap Revision 3
JavaScript
1
star
88

external-link-repo

Repo for debugging issues with linkinator when there are many external links in action
TypeScript
1
star
89

codecovorg

wrapper for codecov.io that fetches repo tokens based on API key
JavaScript
1
star
90

example-todo-cloud-run

redwood example TODO app running on Cloud Run
JavaScript
1
star
91

hack-illinois-chat

chat app built during Hack Illinois 2019
JavaScript
1
star