Adrien Barbaresi (@adbar)
  • Stars
    star
    4,164
  • Global Rank 6,759 (Top 0.3 %)
  • Followers 368
  • Following 283
  • Registered over 12 years ago
  • Most used languages
    Python
    71.4 %
    TeX
    7.1 %
    Perl
    7.1 %
    Java
    7.1 %
    Pascal
    7.1 %
  • Location πŸ‡©πŸ‡ͺ Germany
  • Country Total Rank 567
  • Country Ranking
    Python
    56
    Pascal
    258
    Perl
    524
    TeX
    1,503

Top repositories

1

trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Python
3,298
star
2

German-NLP

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
449
star
3

simplemma

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Python
136
star
4

htmldate

Fast and robust date extraction from web pages, with Python or on the command-line
Python
117
star
5

courlan

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters
Python
109
star
6

geokelone

integrates spatial and textual data processing tools into a modular software package which features preprocessing, geocoding, disambiguation and visualization
Python
5
star
7

german-reddit

Extraction of a German Reddit Corpus
Python
3
star
8

tweets-tools

Diverse tools used with Twitter data
Python
2
star
9

flux-toolchain

Filtering and Language-identification for URL Crawling Seeds (FLUCS) a.k.a. FLUX-Toolchain
Perl
2
star
10

jlcl-style

Experiments to modernize the LaTeX class of the JLCL
TeX
1
star
11

trafilatura_gui

Python
1
star
12

toponyms

Old prototype for toponym extraction in historical texts written in German
1
star
13

url-compressor

A fast pattern-based URL compression for lists of links
Pascal
1
star
14

zeitcrawler

Automatically exported from code.google.com/p/zeitcrawler
Java
1
star
15

vardial-experiments

Experiments conducted on the occasion of the VarDial shared tasks
Python
1
star
16

microblog-explorer

Perform crawls of social networks (identi.ca, reddit, friendfeed) to gather internal and external links and identify their language
Python
1
star
17

coronakorpus

Material zum Aufbau eines deutschsprachigen COVID-19-Webkorpus / Building a corpus in German dedicated to coronavirus
1
star