There are no reviews yet. Be the first to send feedback to the community and the maintainers!
webarchive-discovery
WARC and ARC indexing and discovery tools.shine
Prototype SOLR-powered web archive exploration UI.webarchive-explorer
Tools for exploring the contents of web archive files.docker-pdf2htmlex
Run pdf2htmlEX in a Docker container.w3act
w3act is an annotation and curation tool for building web archive collectionsopendata
Repository of documentation about the open datasets published by the UK Web Archive.monitrix
A monitoring system for Heritrix 3.ukwa-pywb
qaop
Qaop – ZX Spectrum emulatorukwa-manage
Shepherding our web archives from crawl to access.ukwa-heritrix
The UKWA Heritrix3 custom modules and Docker builder.webarchive-test-suite
A set of test files for web archiving.docker-brozzler
Brozzler in a Docker containercrawl-analysis
Web Archiving Domain Crawl Analysis Scriptswebarchiving-notebooks
A collection of Jupyter notebooks for working with web archive data, tools and APIsukwa-gsheets-utils
Add-On for Google Sheets to help those working with web archives.webrender-phantomjs
A RESTful API for rendering web pages in PhantomJSflashfreeze
A rapid web page analyser and archiver.halflife
Tracking the fortunes of our archived URLs.wren
Experiments in testable, scaleable crawler architecturesaho-corasick
Aho-Corasick in Javaukwa-services
Deployment configuration for all UKWA services stacks.mementoweb-webclient
A simple web-based interface to Memento holdings.acid-crawl
An acid test suite for crawlers.ukwa-documentation
Public documentation about the technical architecture of the UK Web Archivewebarchive-wat-mining
WAT (web archive transform) metadata miningdocker-warcprox
Run warcprox inside Dockersolr-proxy
An NGINX proxy to control access to the Solr API.python-warcwriterpool
Hopefully off-setting some of the difficulties writing to WARCs (multiple open files, size limits, etc.).ukwa-warc-server
Serves our WARC files for playback, wherever they may lie.ukwa
UKWAwaybacks
This module builds our Waybacks in the various different configurations we require.webrender-puppeteer
Web page rendering service based on Google's Puppeteerwebarchive-fuse
Use FUSE-J to mount web archive files as filesystems.javaswf
Mavenised version of the JavaSWF codebase, in order to resolve the dependencies for Heritrix3.glean
Using web scrapers to extract data from the archived webukwa-player
Highly experimental sketch of a hi-fidelity web archive 'player' for proxy-based accessdocker-airflow
Apache Airflow with a few additional dependenciesdocker-hadoop
Hadoop running in a container.ukwa.github.com
UK Web Archive GitHub Homepagedocker-grobid
GROBID (GeneRation Of BIbliographic Data) in a Docker container.httpfs
Apache Hadoop HttpFS for cdh3ukwa-blacklight
Experimenting with Blacklightpython-w3act
Python clients for W3ACT and Heritrix3mementoweb-client-java
Core Java libraries for Memento clients.crawl-test-site
A simple site that uses GitHub pages to host resources for testing crawlers.language-detection
Experimenting with https://code.google.com/p/language-detection/file-archive-recordreader
File Archive RecordReaderpython-webhdfs
Python wrapper around Hadoop's WebHDFS interface.docker-hypercored
A containerised Dat server for experimental dataset hosting.crawl-db
A standalone database for crawl events.ukwa-tasks
Luigi tasks for running Hadoop jobs and managing material held on HDFSkatacoda-scenarios
Katacoda Scenariosukwa-ingest-services
The dockerized ensemble of services that run most of the UKWA crawl and ingest processes.hdfs-exporter
Scrapes the Hadoop status pages for PrometheusLove Open Source and this site? Check out how you can help us