There are no reviews yet. Be the first to send feedback to the community and the maintainers!
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, commentsGerman-NLP
Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on Germansimplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiencyhtmldate
Fast and robust date extraction from web pages, with Python or on the command-linecourlan
Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filtersgeokelone
integrates spatial and textual data processing tools into a modular software package which features preprocessing, geocoding, disambiguation and visualizationtweets-tools
Diverse tools used with Twitter dataflux-toolchain
Filtering and Language-identification for URL Crawling Seeds (FLUCS) a.k.a. FLUX-Toolchainjlcl-style
Experiments to modernize the LaTeX class of the JLCLtrafilatura_gui
toponyms
Old prototype for toponym extraction in historical texts written in Germanzeitcrawler
Automatically exported from code.google.com/p/zeitcrawlerurl-compressor
A fast pattern-based URL compression for lists of linkscoronakorpus
Material zum Aufbau eines deutschsprachigen COVID-19-Webkorpus / Building a corpus in German dedicated to coronavirusvardial-experiments
Experiments conducted on the occasion of the VarDial shared tasksmicroblog-explorer
Perform crawls of social networks (identi.ca, reddit, friendfeed) to gather internal and external links and identify their languageLove Open Source and this site? Check out how you can help us