• Stars
    star
    5
  • Rank 2,861,937 (Top 57 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created about 7 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

integrates spatial and textual data processing tools into a modular software package which features preprocessing, geocoding, disambiguation and visualization

More Repositories

1

trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Python
3,298
star
2

German-NLP

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
449
star
3

simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Python
136
star
4

htmldate

Fast and robust date extraction from web pages, with Python or on the command-line
Python
117
star
5

courlan

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters
Python
109
star
6

german-reddit

Extraction of a German Reddit Corpus
Python
3
star
7

tweets-tools

Diverse tools used with Twitter data
Python
2
star
8

flux-toolchain

Filtering and Language-identification for URL Crawling Seeds (FLUCS) a.k.a. FLUX-Toolchain
Perl
2
star
9

jlcl-style

Experiments to modernize the LaTeX class of the JLCL
TeX
1
star
10

trafilatura_gui

Python
1
star
11

toponyms

Old prototype for toponym extraction in historical texts written in German
1
star
12

url-compressor

A fast pattern-based URL compression for lists of links
Pascal
1
star
13

zeitcrawler

Automatically exported from code.google.com/p/zeitcrawler
Java
1
star
14

vardial-experiments

Experiments conducted on the occasion of the VarDial shared tasks
Python
1
star
15

microblog-explorer

Perform crawls of social networks (identi.ca, reddit, friendfeed) to gather internal and external links and identify their language
Python
1
star
16

coronakorpus

Material zum Aufbau eines deutschsprachigen COVID-19-Webkorpus / Building a corpus in German dedicated to coronavirus
1
star