JHU Human Language Technology Center of Excellence (@hltcoe)

Top repositories

1

golden-horse

Named Entity Recognition for Chinese social media (Weibo). From EMNLP 2015 paper.
Python
534
star
2

turkle

Django-based clone of Amazon's Mechanical Turk service running in your local environment.
Python
142
star
3

PredPatt

PredPatt: Predicate-Argument Extraction from Universal Dependencies
Python
112
star
4

ColBERT-X

CLIR version of ColBERT
Python
63
star
5

mingpipe

A Chinese name matcher written in Python. Describe in: Nanyun Peng, Mo Yu, Mark Dredze. An Empirical Study of Chinese Name Matching and Applications. Association for Computational Linguistics (ACL) (short paper), 2015.
Python
37
star
6

EventMiner

Event extraction pipeline.
Python
35
star
7

concrete-python

Python modules and scripts for working with Concrete, a data serialization format for NLP
Python
20
star
8

patapsco

Cross language information retrieval pipeline
Python
18
star
9

concrete

Thrift definitions, making HLT data specifications concrete
Thrift
16
star
10

clir-tutorial

SIGIR 2023 tutorial on cross language information retrieval.
Jupyter Notebook
13
star
11

gazetteer-collection

Jupyter Notebook
12
star
12

xvectors

Python
7
star
13

HC4

HLTCOE CLIR Common-Crawl Collection
Python
7
star
14

parma

A Predicate Argument Linker
Scala
7
star
15

parma2

A predicate argument alignment tool
Scala
7
star
16

sandle

Run a large language modeling SANDbox in your Local Environment
Python
7
star
17

quicklime

Visualization tool for Concrete, a data serialization format for NLP
JavaScript
7
star
18

concrete-java

Java library for Concrete, a data serialization format for NLP
Java
6
star
19

concrete-deprecated

OLD project for Concrete-thrift
Java
5
star
20

prototurk

Simple server for rapidly prototyping Mechanical Turk interfaces
Python
5
star
21

cadet

CADET is a system for rapid discovery, annotation, and extraction on text
JavaScript
4
star
22

concrete-js

JavaScript library for working with Concrete, a data serialization format for NLP
JavaScript
3
star
23

docker-nltk

A very simple example pipeline for named entity recognition using off-the-shelf NLTK.
Python
3
star
24

vivisect

A framework for exploring the internals of DNN models
Python
3
star
25

vaporengine

VaporEngine
JavaScript
3
star
26

concrete-stanford

Concrete-Stanford: Wraps Stanford NLP with utilities to fit it into a concrete compliant workflow
Java
3
star
27

concrete-gigaword

Tools for mapping English Gigaword v5 to Concrete
Java
2
star
28

tift

Tift is for tokenization
Java
2
star
29

peer_measure

Implementation of the measure Probability of Equal Expected Rank
Python
2
star
30

tasa

TASA - Translation And Structural Alignment
JavaScript
2
star
31

fetch-wikiqa-corpus

Concrete FetchCommunicationService bundled with "WikiQA corpus"
1
star
32

probe

Scala
1
star
33

stretcher

Concrete file server
Java
1
star
34

concrete-stanford-deprecated2

Concrete-Stanford: Wraps Stanford NLP with utilities to fit it into a concrete compliant workflow
Java
1
star
35

annotated-nyt

Java wrappers and utilities for reading the Annotated NYT corpus
Java
1
star
36

lid

Python
1
star
37

simple-search-demo

JavaScript
1
star
38

concrete-ontology

Concrete ontology
Java
1
star
39

concrete-agiga

Tools to map between concrete and agiga representations
Java
1
star
40

styleguides

HLTCOE recommended style guidelines for importing into IDEs
1
star
41

BLADE

Python
1
star
42

rebar

Java
1
star
43

cmn-renmin-ocr-ner-dataset

NER annotations of the Chinese Newspaper Renmin
Python
1
star
44

goncrete

golang bindings for concrete
Go
1
star
45

cadet-search-lucene

A search implementation for Concrete
Java
1
star