• This repository has been archived on 05/Aug/2024
  • Stars
    star
    1
  • Language
  • License
    Other
  • Created over 4 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Material zum Aufbau eines deutschsprachigen COVID-19-Webkorpus / Building a corpus in German dedicated to coronavirus

More Repositories

1

trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Python
3,298
star
2

German-NLP

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
449
star
3

simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Python
136
star
4

htmldate

Fast and robust date extraction from web pages, with Python or on the command-line
Python
117
star
5

courlan

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters
Python
109
star
6

geokelone

integrates spatial and textual data processing tools into a modular software package which features preprocessing, geocoding, disambiguation and visualization
Python
5
star
7

german-reddit

Extraction of a German Reddit Corpus
Python
3
star
8

tweets-tools

Diverse tools used with Twitter data
Python
2
star
9

flux-toolchain

Filtering and Language-identification for URL Crawling Seeds (FLUCS) a.k.a. FLUX-Toolchain
Perl
2
star
10

jlcl-style

Experiments to modernize the LaTeX class of the JLCL
TeX
1
star
11

trafilatura_gui

Python
1
star
12

toponyms

Old prototype for toponym extraction in historical texts written in German
1
star
13

url-compressor

A fast pattern-based URL compression for lists of links
Pascal
1
star
14

zeitcrawler

Automatically exported from code.google.com/p/zeitcrawler
Java
1
star
15

vardial-experiments

Experiments conducted on the occasion of the VarDial shared tasks
Python
1
star
16

microblog-explorer

Perform crawls of social networks (identi.ca, reddit, friendfeed) to gather internal and external links and identify their language
Python
1
star