There are no reviews yet. Be the first to send feedback to the community and the maintainers!
openlibrary
One webpage for every book ever published!heritrix3
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.bookreader
The Internet Archive BookReaderwayback-machine-webextension
A web browser extension for Chrome, Firefox, Edge, and Safari 14.brozzler
brozzler - distributed browser-based web crawlerwarcprox
WARC writing MITM HTTP/S proxyopenlibrary-client
Python Client Library for the Archive.org OpenLibrary APIwarc
Python library for reading and writing warc filesdweb-mirror
Offline Internet Archive projectwarctools
Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)internetarchivebot
bookserver
Archive.org OPDS Bookserver - A standard for digital book distributionfatcat
Perpetual Access To The Scholarly Recordarchive-pdf-tools
Fast PDF generation and compression. Deals with millions of pages daily.fatcat-scholar
search interface for scholarly worksZeno
State-of-the-art web crawler π±iaux
Monorepo for Archive.org UX development and prototyping.openlibrary-bots
A repository of cleanup bots implementing the openlibrary-clientumbra
A queue-controlled browser automation tool for improving web crawl qualitydweb-archive
hind
Hashistack-IN-Docker (single container with nomad + consul + caddy)wayback-machine-firefox
Reduce annoying 404 pages by automatically checking for an archived copy in the Wayback Machine. Learn more about this Test Pilot experiment at https://testpilot.firefox.com/cdx-summary
Summarize web archive capture index (CDX) files.internet-archive-voice-apps
Voice Apps (Actions on Google, Alexa Skill) of Internet Archive. Just say: "Ok Google, Ask Internet Archive to Play Jazz" or "Alexa, Ask Internet Internet Archive to play Instrumental Music"liveweb
Liveweb proxy of the Wayback Machine projectepub
For code related to making ePub filessurt
Sort-friendly URI Reordering Transform (SURT) python modulearchive-hocr-tools
Efficient hOCR toolingtrough
Trough: Big data, small databases.dweb-transport
Internet Archive Decentralized Web Common APIwayback-diff
React components to render differences between captures at the Wayback Machinedweb-transports
sandcrawler
Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wikiiiif
The official Internet Archive IIIF servicecrawling-for-nomore404
snakebite-py3
Pure python HDFS client: python3.x versionnewsum
Daily TV News Summary using GPTia-hadoop-tools
arklet
ARK minter, binder, resolverdweb-gateway
Decentralized web Gateway for Internet Archivexfetch
Cache stampede test harness. Code accompanies the presentation made at RedisConf 2017, 30 May to 1 June, 2017, in San Francisco.openlibrary-librarians
Coordination between the OpenLibrary.org Librarian communityarch
Web application for distributed compute analysis of Archive-It web archive collections.cicd
build & test using github registry; deploy to nomad clustersscrapy-warcio
Support for writing WARC files with Scrapyiacopilot
Summarize and ask questions about items in the Internet Archiveiari
Import workflows for the Wikipedia Citations Databasedoublethink
rethinkdb python librarys3_loader
Watch for local files to appear and move them into S3Sparkling
Internet Archive's Sparkling Data Processing Librarywayback-machine-android
archive-commons
draintasker
a tool for continuously ingesting w/arc files into the archiveias3
Internet Archive S3-like connectorwayback-radial-tree
chocula
journal-level metadata munging. part of fatcat projectread_api_extras
Demo code for the Open Library Read APIwikibase-patcher
Python library for interacting with the Wikibase REST APIdweb-archivecontroller
web_collection_search
An API wrapper to the Elasticsearch index of web archival collections and a web UI to explore those indexes.epub-labs
epub-labsiaux-typescript-wc-template
IAUX Typescript WebComponent Templateia
A JS interface to archive.orgarchive-ocr-tools
offlinesolr
Tool to build solr index offlineia-bin-tools
Internet Archive Command-line Utilitiesdweb-objects
iare
An interactive IARI JSON vieweriaux-collection-browser
wayback-machine-safari
collections-cleaners
trendmachine
A mathematical model to calculate a normalized score to quantify the temporal resilience of a web page as a time-series data based on the historical observations of the page in web archives.acs4_py
Python interface to ACS4esbuild_es5
minify JS/TS files using `esbuild` and `swc` down to ES5 (uses `deno`)iaux-search-service
map-of-the-web
eventer
Eventer is a simple event dispatching library in Pythoniaux-donation-form
The Internet Archive Donation Forminternetarchive.github.com
Internet Archive Open Source Blogisodos
Go module to interact with Internet Archive's Isodos APIstrainer
Heritrix frontier files manipulation tool.internet-archive-alexa-skill
btget
Command line retrieval of torrents using transmission-daemon (via transmission-remote)mediawiki-extension-archive-leaf
A MediaWiki extension that supports importing of Archive.org palm leaf itemshashitalksdemo
openlibrary-api
API documentation for https://github.com/internetarchive/openlibraryhttpd
Fast and easy-to-use web server, using the Deno native http server (hyper in rust). It serves static files & dirs, with arbitrary handling using an optional `handler` argument.wbm_ai_kg
Google Summer of Code (GSoC) 2024 Wayback Machine GenAI Knowledge Graph projectfile_server_plus
`deno` static file webserver, clone of `file_server.ts`, PLUS an additional final "404 handler" to run arbitrary JS/TSdyno
archiveorg-e2e-playwright
tarb_insights
A Streamlit application to visualize Wikipedia IABot statisticsrulesengine-client
Python client package for the playback rules enginecoderunr
deploy saved changes to website unique hostnames instantly -- can skip commits, pushes & full CI/CDdeferred
Redis promises & futures library for Predis / PHPhello-js
an example of full CI/CD from GitHub to a nomad clusterwiki-references-db
Data models and scripts to build a database of references (broadly defined) appearing on Wikipedia and other wikiskohacon2011-presentation
Presentation for KohaCon 2011rulesengine
model and front-end for rules for managing wayback playbackdeploy
GitHub Action to deploy to a nomad clusterLove Open Source and this site? Check out how you can help us