Florian Leitner (@fnl)

Top repositories

1

syntok

Text tokenization and sentence segmentation (segtok v2)
Python
201
star
2

segtok

Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic features.
Python
170
star
3

pymonad

"fork" of PyMonad on BitBucket to change the ``*`` functor/composition operator to ``<<``
Python
31
star
4

patricia-trie

a pure-Python PATRICIA trie implementation.
Python
31
star
5

medic

a Python 3 command-line tool to maintain a DB mirror of MEDLINE (https://pypi.python.org/pypi/medic) - ALERT: As I have moved out of science and am working as a consultant now, this project might need a new maintainer once PubMed changes its XML format. Heroes?
Python
25
star
6

progress_bar

an informative progress bar for Python 2+3 command-line tools
Python
13
star
7

asdm-tm-class

Course material for the Madrid ASDM class on text mining (C09)
Jupyter Notebook
12
star
8

libfnl

Python 3 tools for data mining in molecular biology
Python
12
star
9

classipy

A command-line tool to develop advanced text classifiers using SciKit-Learn.
Python
9
star
10

sentence_splitter

check my new spliter - segtok
Python
8
star
11

tokenizer

a concurrent, deterministic finite state tokenizer (for letter-based scripts)
Go
4
star
12

txtfnnl

a UIMA-based text mining pipeline
Java
3
star
13

SPECIES

a modified version of the SPECIES tagger
C++
2
star
14

otplc

A tool to convert corpus annotations between the brat annotation and OTPL formats.
Python
2
star
15

vimrc

my (Vim-centric) POSIX environment
Vim Script
2
star
16

cpp-project-template

A very basic C++ project structure using CMake, Catch2, and cxxopts.
C++
2
star
17

go

Golang source code collection
Go
2
star
18

bceval

BioCreative Evaluation Scripts and Library
Python
2
star
19

bootstrap

jump-start a simple GNU C project
C
2
star
20

lexikos

a minimal acyclic deterministic finite state automaton (MADFA)
Scala
1
star
21

gnamed

a tool to manage a unified repository of gene and protein names, symbols, keywords, literature references, and species associations
Python
1
star
22

word2numpy

A Python 3.0 port of word2vec.py, in itself a Python 2.7 port of word2vec
Python
1
star
23

segmenter

scripts to pre-process plain-text: sentence segmentation, tokenization, and stemming
Perl
1
star
24

OnlineTaggerFramework

an online tagger wrapper for GATE that only spans one global sub-process per processing resource
Java
1
star
25

fnl.github.io

my blog (http://fnl.es)
HTML
1
star
26

ibecs-to-omtd-transformer

A transformer that converts an IBECS XML file into an OMTD-SHARE corpus
Python
1
star
27

chemcheck

a syntax checker for BioCreative IV CHEMDNER task annotations
C
1
star
28

couchpy

a python3 library to programmatically access CouchDB (written when there was none, "long ago"...)
Python
1
star
29

libfsmg

A finite state machine library for pattern matching on generic types in Java sequence containers.
Java
1
star