There are no reviews yet. Be the first to send feedback to the community and the maintainers!
webarchive-discovery
WARC and ARC indexing and discovery tools.shine
Prototype SOLR-powered web archive exploration UI.webarchive-explorer
Tools for exploring the contents of web archive files.docker-pdf2htmlex
Run pdf2htmlEX in a Docker container.w3act
w3act is an annotation and curation tool for building web archive collectionsopendata
Repository of documentation about the open datasets published by the UK Web Archive.monitrix
A monitoring system for Heritrix 3.ukwa-pywb
qaop
Qaop – ZX Spectrum emulatorukwa-manage
Shepherding our web archives from crawl to access.ukwa-heritrix
The UKWA Heritrix3 custom modules and Docker builder.webarchive-test-suite
A set of test files for web archiving.docker-brozzler
Brozzler in a Docker containercrawl-analysis
Web Archiving Domain Crawl Analysis Scriptswebarchiving-notebooks
A collection of Jupyter notebooks for working with web archive data, tools and APIsukwa-gsheets-utils
Add-On for Google Sheets to help those working with web archives.webrender-phantomjs
A RESTful API for rendering web pages in PhantomJSflashfreeze
A rapid web page analyser and archiver.halflife
Tracking the fortunes of our archived URLs.wren
Experiments in testable, scaleable crawler architecturesaho-corasick
Aho-Corasick in Javaukwa-services
Deployment configuration for all UKWA services stacks.mementoweb-webclient
A simple web-based interface to Memento holdings.acid-crawl
An acid test suite for crawlers.ukwa-documentation
Public documentation about the technical architecture of the UK Web Archivewebarchive-wat-mining
WAT (web archive transform) metadata miningdocker-warcprox
Run warcprox inside Dockersolr-proxy
An NGINX proxy to control access to the Solr API.python-warcwriterpool
Hopefully off-setting some of the difficulties writing to WARCs (multiple open files, size limits, etc.).ukwa-warc-server
Serves our WARC files for playback, wherever they may lie.ukwa
UKWAwaybacks
This module builds our Waybacks in the various different configurations we require.webrender-puppeteer
Web page rendering service based on Google's Puppeteerjavaswf
Mavenised version of the JavaSWF codebase, in order to resolve the dependencies for Heritrix3.glean
Using web scrapers to extract data from the archived webukwa-player
Highly experimental sketch of a hi-fidelity web archive 'player' for proxy-based accessdocker-airflow
Apache Airflow with a few additional dependenciesdocker-hadoop
Hadoop running in a container.ukwa.github.com
UK Web Archive GitHub Homepagedocker-grobid
GROBID (GeneRation Of BIbliographic Data) in a Docker container.httpfs
Apache Hadoop HttpFS for cdh3ukwa-blacklight
Experimenting with Blacklightukwa-access-api
An application to wrap up APIs for accessing UKWA content.python-w3act
Python clients for W3ACT and Heritrix3mementoweb-client-java
Core Java libraries for Memento clients.crawl-test-site
A simple site that uses GitHub pages to host resources for testing crawlers.language-detection
Experimenting with https://code.google.com/p/language-detection/file-archive-recordreader
File Archive RecordReaderpython-webhdfs
Python wrapper around Hadoop's WebHDFS interface.docker-hypercored
A containerised Dat server for experimental dataset hosting.crawl-db
A standalone database for crawl events.ukwa-tasks
Luigi tasks for running Hadoop jobs and managing material held on HDFSkatacoda-scenarios
Katacoda Scenariosukwa-ingest-services
The dockerized ensemble of services that run most of the UKWA crawl and ingest processes.hdfs-exporter
Scrapes the Hadoop status pages for PrometheusLove Open Source and this site? Check out how you can help us