• This repository has been archived on 01/Dec/2023
  • Stars
    star
    253
  • Rank 160,776 (Top 4 %)
  • Language
    Java
  • Created almost 13 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Creates a Neo4j graph of Wikipedia links.

Graphipedia

A tool for creating a Neo4j graph database of Wikipedia pages and the links between them.

Building

This is a Java project built with Maven.

Check the neo4j.version property in the top-level pom.xml file and make sure it matches the Neo4j version you intend to use to open the database. Then build with

mvn package

This will generate a package including all dependencies in graphipedia-dataimport/target/graphipedia-dataimport.jar.

Importing Data

The graphipedia-dataimport module allows to create a Neo4j database from a Wikipedia database dump.

See Wikipedia:Database_download for instructions on getting a Wikipedia database dump.

Assuming you downloaded pages-articles.xml.bz2, follow these steps:

  1. Run ExtractLinks to create a smaller intermediate XML file containing page titles and links only. The best way to do this is decompress the bzip2 file and pipe the output directly to ExtractLinks:

    bzip2 -dc pages-articles.xml.bz2 | java -classpath graphipedia-dataimport.jar org.graphipedia.dataimport.ExtractLinks - enwiki-links.xml

  2. Run ImportGraph to create a Neo4j database with nodes and relationships into a graphdb directory

    java -Xmx3G -classpath graphipedia-dataimport.jar org.graphipedia.dataimport.neo4j.ImportGraph enwiki-links.xml graphdb

Just to give an idea, enwiki-20130204-pages-articles.xml.bz2 is 9.1G and contains almost 10M pages, resulting in over 92M links to be extracted.

On my laptop with an SSD drive the import takes about 30 minutes to decompress/ExtractLinks (pretty much the same time as decompressing only) and an additional 10 minutes to ImportGraph.

(Note that disk I/O is the critical factor here: the same import will easily take several hours with an old 5400RPM drive.)

Querying

The Neo4j browser can be used to query and visualise the imported graph. Here are some sample Cypher queries.

Show all pages linked to a given starting page - e.g. "Neo4j":

MATCH (p0:Page {title:'Neo4j'}) -[Link]- (p:Page)
RETURN p0, p

Find how two pages - e.g. "Neo4j" and "Kevin Bacon" - are connected:

MATCH (p0:Page {title:'Neo4j'}), (p1:Page {title:'Kevin Bacon'}),
  p = shortestPath((p0)-[*..6]-(p1))
RETURN p

More Repositories

1

jodconverter

JODConverter automates document conversions using LibreOffice/OpenOffice.org
Java
463
star
2

pyodconverter

Python script to automate document conversions using LibreOffice/OpenOffice.org
Python
359
star
3

angular2-course-webpack-starter

Simple Angular 2 + TypeScript + Webpack starter project used in the Angular 2 From The Ground Up course.
JavaScript
44
star
4

next-reviews

Application used in the Next.js by Example course
JavaScript
41
star
5

next-blog

Blog app for the Next.js by Example course
JavaScript
36
star
6

next-shop

Shop app for the Next.js by Example course
JavaScript
32
star
7

notarealdb

A "fake" database for Node.js that stores data in local JSON files, for testing and sample applications.
TypeScript
30
star
8

ionic2-webpack2-starter

Ionic 2 starter project based on Webpack 2
CSS
25
star
9

graphql-job-board

JavaScript
25
star
10

buildchatbot

Python script that monitors Jenkins builds and notifies a Skype chat
Python
22
star
11

ionic2-by-example

Code samples for the "Ionic 2 by Example" course
20
star
12

graphql-hello-world

JavaScript
20
star
13

ionix-sqlite

Makes it easier to use SQLite in your Ionic 2 app.
TypeScript
11
star
14

graphql-chat

JavaScript
7
star
15

ionic2-with-angular-cli

Sample project showing how to use Ionic 2 in a project generated with the Angular CLI
CSS
7
star
16

fakebase

A "fake" database for Node.js that stores data in local JSON files, for testing and sample applications.
TypeScript
7
star
17

maven-dependency-sanity-plugin

Java
5
star
18

seriala

Serialization for Scala.
Scala
5
star
19

graphql-examples

JavaScript
4
star
20

traversable-csv

A simple CSV parser written in Scala leveraging the Traversable trait.
Scala
3
star
21

jenkins-slack-bot

Monitor Jenkins builds and notify a Slack channel
JavaScript
2
star
22

ionic-react-hello-world

Hello World example for the Ionic React course
JavaScript
1
star
23

test-workspace-addon

JavaScript
1
star
24

jersik

Json-Encoded Remote Service Invocation Kit
Scala
1
star
25

ionic-react-daily-moments

Sample app for the Ionic React course
TypeScript
1
star
26

ionic-capacitor3-datetime

TypeScript
1
star