• Stars
    star
    1
  • Language
    Java
  • Created over 9 years ago
  • Updated over 9 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

an online tagger wrapper for GATE that only spans one global sub-process per processing resource

More Repositories

1

syntok

Text tokenization and sentence segmentation (segtok v2)
Python
201
star
2

segtok

Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic features.
Python
170
star
3

pymonad

"fork" of PyMonad on BitBucket to change the ``*`` functor/composition operator to ``<<``
Python
31
star
4

patricia-trie

a pure-Python PATRICIA trie implementation.
Python
31
star
5

medic

a Python 3 command-line tool to maintain a DB mirror of MEDLINE (https://pypi.python.org/pypi/medic) - ALERT: As I have moved out of science and am working as a consultant now, this project might need a new maintainer once PubMed changes its XML format. Heroes?
Python
25
star
6

progress_bar

an informative progress bar for Python 2+3 command-line tools
Python
13
star
7

asdm-tm-class

Course material for the Madrid ASDM class on text mining (C09)
Jupyter Notebook
12
star
8

libfnl

Python 3 tools for data mining in molecular biology
Python
12
star
9

classipy

A command-line tool to develop advanced text classifiers using SciKit-Learn.
Python
9
star
10

sentence_splitter

check my new spliter - segtok
Python
8
star
11

tokenizer

a concurrent, deterministic finite state tokenizer (for letter-based scripts)
Go
4
star
12

txtfnnl

a UIMA-based text mining pipeline
Java
3
star
13

SPECIES

a modified version of the SPECIES tagger
C++
2
star
14

otplc

A tool to convert corpus annotations between the brat annotation and OTPL formats.
Python
2
star
15

vimrc

my (Vim-centric) POSIX environment
Vim Script
2
star
16

cpp-project-template

A very basic C++ project structure using CMake, Catch2, and cxxopts.
C++
2
star
17

go

Golang source code collection
Go
2
star
18

bceval

BioCreative Evaluation Scripts and Library
Python
2
star
19

bootstrap

jump-start a simple GNU C project
C
2
star
20

lexikos

a minimal acyclic deterministic finite state automaton (MADFA)
Scala
1
star
21

gnamed

a tool to manage a unified repository of gene and protein names, symbols, keywords, literature references, and species associations
Python
1
star
22

word2numpy

A Python 3.0 port of word2vec.py, in itself a Python 2.7 port of word2vec
Python
1
star
23

segmenter

scripts to pre-process plain-text: sentence segmentation, tokenization, and stemming
Perl
1
star
24

fnl.github.io

my blog (http://fnl.es)
HTML
1
star
25

ibecs-to-omtd-transformer

A transformer that converts an IBECS XML file into an OMTD-SHARE corpus
Python
1
star
26

chemcheck

a syntax checker for BioCreative IV CHEMDNER task annotations
C
1
star
27

couchpy

a python3 library to programmatically access CouchDB (written when there was none, "long ago"...)
Python
1
star
28

libfsmg

A finite state machine library for pattern matching on generic types in Java sequence containers.
Java
1
star