A set of reusable Java components that implement functionality common to any web crawler
HtmlUnit is a "GUI-Less browser for Java programs".
Apache Nutch is an extensible and scalable web crawler
A browser automation framework and ecosystem.
Collection of reusable loosely Selenium-related code - see https://github.com/sergueik/selenium_tests for strict Selenium test code examples
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Yet Another UserAgent Analyzer