• Stars
    star
    1
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 12 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Core Java libraries for Memento clients.

More Repositories

1

webarchive-discovery

WARC and ARC indexing and discovery tools.
Java
113
star
2

shine

Prototype SOLR-powered web archive exploration UI.
JavaScript
42
star
3

webarchive-explorer

Tools for exploring the contents of web archive files.
Java
39
star
4

docker-pdf2htmlex

Run pdf2htmlEX in a Docker container.
Python
23
star
5

w3act

w3act is an annotation and curation tool for building web archive collections
Java
19
star
6

opendata

Repository of documentation about the open datasets published by the UK Web Archive.
HTML
14
star
7

monitrix

A monitoring system for Heritrix 3.
Java
12
star
8

ukwa-pywb

JavaScript
11
star
9

qaop

Qaop – ZX Spectrum emulator
Java
10
star
10

ukwa-manage

Shepherding our web archives from crawl to access.
Jupyter Notebook
10
star
11

ukwa-heritrix

The UKWA Heritrix3 custom modules and Docker builder.
Java
9
star
12

webarchive-test-suite

A set of test files for web archiving.
Arc
8
star
13

docker-brozzler

Brozzler in a Docker container
Shell
7
star
14

crawl-analysis

Web Archiving Domain Crawl Analysis Scripts
Jupyter Notebook
7
star
15

webarchiving-notebooks

A collection of Jupyter notebooks for working with web archive data, tools and APIs
Jupyter Notebook
7
star
16

ukwa-gsheets-utils

Add-On for Google Sheets to help those working with web archives.
JavaScript
6
star
17

webrender-phantomjs

A RESTful API for rendering web pages in PhantomJS
Python
6
star
18

flashfreeze

A rapid web page analyser and archiver.
Python
6
star
19

halflife

Tracking the fortunes of our archived URLs.
Jupyter Notebook
5
star
20

wren

Experiments in testable, scaleable crawler architectures
PHP
5
star
21

aho-corasick

Aho-Corasick in Java
Java
4
star
22

ukwa-services

Deployment configuration for all UKWA services stacks.
Python
4
star
23

mementoweb-webclient

A simple web-based interface to Memento holdings.
Java
4
star
24

acid-crawl

An acid test suite for crawlers.
PHP
4
star
25

ukwa-documentation

Public documentation about the technical architecture of the UK Web Archive
Jupyter Notebook
4
star
26

webarchive-wat-mining

WAT (web archive transform) metadata mining
Shell
3
star
27

docker-warcprox

Run warcprox inside Docker
Python
3
star
28

solr-proxy

An NGINX proxy to control access to the Solr API.
Dockerfile
3
star
29

python-warcwriterpool

Hopefully off-setting some of the difficulties writing to WARCs (multiple open files, size limits, etc.).
Python
2
star
30

ukwa-warc-server

Serves our WARC files for playback, wherever they may lie.
Python
2
star
31

ukwa

UKWA
Java
2
star
32

waybacks

This module builds our Waybacks in the various different configurations we require.
Java
2
star
33

webrender-puppeteer

Web page rendering service based on Google's Puppeteer
JavaScript
2
star
34

webarchive-fuse

Use FUSE-J to mount web archive files as filesystems.
Java
2
star
35

javaswf

Mavenised version of the JavaSWF codebase, in order to resolve the dependencies for Heritrix3.
Java
2
star
36

glean

Using web scrapers to extract data from the archived web
Python
2
star
37

ukwa-player

Highly experimental sketch of a hi-fidelity web archive 'player' for proxy-based access
JavaScript
2
star
38

docker-airflow

Apache Airflow with a few additional dependencies
Dockerfile
1
star
39

docker-hadoop

Hadoop running in a container.
Dockerfile
1
star
40

ukwa.github.com

UK Web Archive GitHub Homepage
CSS
1
star
41

docker-grobid

GROBID (GeneRation Of BIbliographic Data) in a Docker container.
1
star
42

httpfs

Apache Hadoop HttpFS for cdh3
Java
1
star
43

ukwa-blacklight

Experimenting with Blacklight
Ruby
1
star
44

ukwa-access-api

An application to wrap up APIs for accessing UKWA content.
Python
1
star
45

python-w3act

Python clients for W3ACT and Heritrix3
Python
1
star
46

crawl-test-site

A simple site that uses GitHub pages to host resources for testing crawlers.
CSS
1
star
47

language-detection

Experimenting with https://code.google.com/p/language-detection/
PHP
1
star
48

file-archive-recordreader

File Archive RecordReader
Java
1
star
49

python-webhdfs

Python wrapper around Hadoop's WebHDFS interface.
Python
1
star
50

docker-hypercored

A containerised Dat server for experimental dataset hosting.
Dockerfile
1
star
51

crawl-db

A standalone database for crawl events.
Python
1
star
52

ukwa-tasks

Luigi tasks for running Hadoop jobs and managing material held on HDFS
Python
1
star
53

katacoda-scenarios

Katacoda Scenarios
Shell
1
star
54

ukwa-ingest-services

The dockerized ensemble of services that run most of the UKWA crawl and ingest processes.
Shell
1
star
55

hdfs-exporter

Scrapes the Hadoop status pages for Prometheus
Python
1
star