• Stars
    star
    139
  • Rank 255,223 (Top 6 %)
  • Language
    JavaScript
  • License
    GNU Lesser Genera...
  • Created over 10 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Thai word breaker for Node.js

wordcut

Thai word breaker for Node.js

Installation

npm install wordcut

Usage

var wordcut = require("wordcut");

wordcut.init();
console.log(wordcut.cut("กากา"));

With additional custom dictionary

//see test/test_customdict.js
wordcut.init([customdict/*.txt],true);

Command line interface

npm install -g wordcut
echo 'กากากา' | wordcut

Options

  • --delim
  • --dict

Web API

node server.js

Trying Web API

curl -X POST --data-binary '{"line":"กากา"}' http://localhost:8882/segment

Development

More Repositories

1

PhlongTaIam

PHP Thai word breaker
PHP
34
star
2

chamkho

Khmer, Lao, Myanmar, and Thai word segmentation/breaking library and command line
Rust
34
star
3

Yaitron

Yaitron English-Thai and Thai-English dictionary
Python
28
star
4

mapkha

Thai word segmentation program in Go
Go
27
star
5

thailang4r

Thai language utility for Ruby
Ruby
25
star
6

wordcutpy

A simple word breaker written in Python
Python
18
star
7

wordcut-engine

Word segmentation library in Rust
Rust
9
star
8

thaiwordseg

Thai word segmenter written in C
C
8
star
9

pdf2txt_th

Thai pdf to text script
Ruby
7
star
10

bhasati

Bhasati is a Mastodon client for desktop written in Ruby using GTK+
Ruby
5
star
11

corenlptut

Python
4
star
12

wordcut-clj

A word segmentation tool for ASEAN languages written in Clojure
Clojure
4
star
13

cl-wordcut

Word segmentation tools for ASEAN languages written in Common Lisp
Common Lisp
3
star
14

chamkho-pg

Rust
3
star
15

word-freq

Word frequency counter written in Rust
Rust
3
star
16

lao-dictionary

Automatically exported from code.google.com/p/lao-dictionary
3
star
17

cl-rocksdb

RocksDB binding for Common Lisp
Common Lisp
3
star
18

simple-compojure-api-buddy-example

Simple compojure-api + buddy example
Clojure
2
star
19

thai-wordnet-db

Awk
2
star
20

learn_awk

2
star
21

admichat

A simple web chat for talking web admin
Rust
2
star
22

thaidix

Free English-Thai dictionary for machine translation
Ruby
2
star
23

vrocket

A hello world example of Rocket.rs with run.sh that auto reload the server
Rust
2
star
24

khatson

Attacut port to Rust
Rust
2
star
25

wordcut-json-rpc-server

Wordcut JSON-RPC server
Ruby
2
star
26

tha-eng-wn

tha-eng-wn is Thai-English bidix generator from Wordnet
Ruby
2
star
27

switch

NodeMCU based switch controller server
JavaScript
2
star
28

utf8-input-stream

A UTF-8 string input stream over a binary stream for Common Lisp
Common Lisp
2
star
29

prolog-sheet

Prolog
1
star
30

evbcorpus

Automatically exported from code.google.com/p/evbcorpus
1
star
31

wordcut-server.js

Wordcut server
JavaScript
1
star
32

stream-par-procs

Stream parallel processors for Common Lisp
Common Lisp
1
star
33

disp_amphi

JavaScript
1
star
34

mgawika

mgawika is a PostgreSQL extension that enables full-text searching on almost every known human language.
Rust
1
star
35

asean-word-freq

ASEAN word occurrence counter from HTML files
Ruby
1
star
36

prefixtree

prefixtree is a simple prefix tree based HashMap
Rust
1
star
37

rum3

rum3 is an example figwheel + rum + ring + bidi usages.
Clojure
1
star
38

mapkha-cli

mapkha-cli is a command line tool for Mapkha - Thai word segmentation (wordcut; word boundary identification; ตัดคำ) program in Go (golang)
Go
1
star
39

lemma_srv

Python
1
star
40

libre-thai-chat-logs

1
star
41

parallel_corpus_tool

A tool for loading parallel corpus
Rust
1
star
42

moses-smt-docker

Dockerfile
1
star
43

thai_romanize

Ruby
1
star
44

thai-pos

Thai word breaker and part-of-speech tagger
Clojure
1
star
45

wordcut-guile

Word segmentaton tool written in Scheme (GNU Guile)
E
1
star
46

wordcut.rb

ASEAN word tokenizer written in Ruby
Ruby
1
star
47

wordcut-x

A word segmentation tool for ASEAN languages written in Java
Java
1
star
48

wordlist-collector

Shell
1
star
49

reinarb

A toolset for Apertium written in Ruby
Ruby
1
star
50

wordcutw

A C-interface wrapper for Wordcut - a Lao/Thai word segmentation/breaking library
Rust
1
star
51

entity-gen

A Emacs Lisp script for generating simple JPA entity/class code from a PostgreSQL table
Emacs Lisp
1
star