There are no reviews yet. Be the first to send feedback to the community and the maintainers!
ArchiveSpark
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.Web2Warc
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)HadoopConcatGz
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gzinternetarchive-transfer-scripts
Scripts to transfer archive.org collections, using https://github.com/jjjake/internetarchiveHadoopWebGraph
A Hadoop input format to use gaphs in WebGraph's BV format with Hadoop and Spark.Exspec
Don't write specs anymore, just save 'em while testing your code interactively. Specs will become a byproduct.IABooksOnArchiveSpark
Analyze digitized books from the Internet Archive remotely with ArchiveSparkMicrawler
Create and cite micro Web archives with semantics as temporal representations of objects / entities / concepts on the WebArchiveSpark-server
A server application that provides a Web service API for ArchiveSpark to be used by third-party applications to integrate temporal Web archive data with a flexible, easy-to-use interface.FEL4ArchiveSpark
Yahoo's Fast Entity Linker for ArchiveSparkMHLonArchiveSpark
Work with Medical Heritage Library collections using ArchiveSparkArchiveSpark-Zeppelin-Docker
ArchiveSpark with Zeppelin as ready-to-use Docker imageArchiveSpark-AUT-bridge
The compatibility layer between ArchiveSpark and The Archives Unleashed Toolkit (AUT)WarcPartitioner
Partition (W)ARC Files by MIME Type and YearArchiveSpark-docker
ArchiveSpark on DockerArchiveSpark2Triples
Convert web archives to RDF triples with ArchiveSparkArchivePig
An Apache Pig framework that facilitates access to Web Archives, enables easy data extraction as well as derivation.Love Open Source and this site? Check out how you can help us