• Stars
    star
    197
  • Rank 197,722 (Top 4 %)
  • Language
    C++
  • License
    Other
  • Created almost 14 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The Kyoto Text Analysis Toolkit for word segmentation and pronunciation estimation, etc.

KyTea Build Status

KyTea is a general text analysis toolkit, with a focus on Japanese and other languages requiring word or morpheme segmentation.

Detailed usage information can be found at http://www.phontron.com/kytea

To build KyTea, run

 $ ./configure
 $ make

If there is no configure file in the directory (for example, if you checked the source out from GitHub), you must rebuild the configure file using the following command:

 $ autoreconf -i

More Repositories

1

nn4nlp-code

Code Samples from Neural Networks for NLP
Python
1,303
star
2

lowresource-nlp-bootcamp-2020

The website for the CMU Language Technologies Institute low resource NLP bootcamp 2020
Jupyter Notebook
598
star
3

nlptutorial

A Tutorial about Programming for Natural Language Processing
Perl
423
star
4

nmt-tips

A tutorial about neural machine translation including tips on building practical systems
Perl
368
star
5

nlp-from-scratch-assignment-2022

An assignment for CMU CS11-711 Advanced NLP, building NLP systems from scratch
Python
168
star
6

lamtram

lamtram: A toolkit for neural language and translation modeling
C++
138
star
7

anlp-code

Jupyter Notebook
130
star
8

research-career-tools

Python
128
star
9

naacl18tutorial

NAACL 2018 Tutorial: Modelling Natural Language, Programs, and their Intersection
TeX
102
star
10

minbert-assignment

Minimalist BERT implementation assignment for CS11-711
Python
70
star
11

minnn-assignment

An assignment on creating a minimalist neural network toolkit for CS11-747
Python
64
star
12

yrsnlp-2016

Structured Neural Networks for NLP: From Idea to Code
Jupyter Notebook
59
star
13

minllama-assignment

Python
48
star
14

util-scripts

Various utility scripts useful for natural language processing, machine translation, etc.
Perl
46
star
15

latticelm

Software for unsupervised word segmentation and language model learning using lattices
C++
45
star
16

coderx

A highly sophisticated sequence-to-sequence model for code generation
Python
40
star
17

rapid-adaptation

Reproduction instructions for "Rapid Adaptation of Neural Machine Translation to New Languages"
Shell
39
star
18

mtandseq2seq-code

Code examples for CMU CS11-731, Machine Translation and Sequence-to-sequence Models
Python
33
star
19

travatar

This is a repository for the Travatar forest-to-string translation decoder
C++
28
star
20

lxmls-2017

Slides/code for the Lisbon machine learning school 2017
Python
28
star
21

modlm

modlm: A toolkit for mixture of distributions language models
C++
27
star
22

kylm

The Kyoyo Language Modeling Toolkit
Java
27
star
23

pialign

pialign - A Phrasal ITG Aligner
C++
23
star
24

pgibbs

An implementation of parallel gibbs sampling for word segmentation and POS tagging.
C++
16
star
25

nlp-from-scratch-assignment-spring2024

An assignment for building an NLP system from scratch.
16
star
26

lader

A reordering tool for machine translation.
C++
15
star
27

howtocode-2017

An example of DyNet autobatching for the NIPS "how to code a paper" workshop
Jupyter Notebook
13
star
28

kyfd

A decoder for finite state models for text processing.
C++
12
star
29

egret

A fork of the Egret parser that fixes a few bugs
C++
10
star
30

latticelm-v2

Second version of latticelm, a tool for learning language models from lattices
C++
7
star
31

globalutility

TeX
6
star
32

nafil

A program for performing bilingual corpus filtering
C++
4
star
33

prontron

A discriminative pronunciation estimator using the structured perceptron algorithm.
Perl
4
star
34

wat2014

Scripts for creating a system similar to the NAIST submission to WAT2014
Shell
3
star
35

multi-extract

A script for extracting multi-synchronous context-free grammars
Python
2
star
36

nile

A clone of the nile alignment toolkit
C++
1
star
37

webigator

A program to aggregate, rank, and search text information
Perl
1
star
38

ribes-c

A C++ implementation of the RIBES machine translation evaluation measure.
C++
1
star
39

swe-bench-zeno

Scripts for analyzing swe-bench with Zeno
Python
1
star