• Stars
    star
    926
  • Rank 49,328 (Top 1.0 %)
  • Language
    Rust
  • License
    MIT License
  • Created about 8 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Natural language detection library for Rust. Try demo online: https://whatlang.org/

Whatlang - rust library for natural language detection

Whatlang

Natural language detection for Rust with focus on simplicity and performance.

Try online demo.

Build Status License Documentation

Stand With Ukraine

Content

Features

  • Supports 69 languages
  • 100% written in Rust
  • Lightweight, fast and simple
  • Recognizes not only a language, but also a script (Latin, Cyrillic, etc)
  • Provides reliability information

Get started

Example:

use whatlang::{detect, Lang, Script};

fn main() {
    let text = "Ĉu vi ne volas eklerni Esperanton? Bonvolu! Estas unu de la plej bonaj aferoj!";

    let info = detect(text).unwrap();
    assert_eq!(info.lang(), Lang::Epo);
    assert_eq!(info.script(), Script::Latin);
    assert_eq!(info.confidence(), 1.0);
    assert!(info.is_reliable());
}

For more details (e.g. how to blacklist some languages) please check the documentation.

Who uses Whatlang?

Whatlang is used within the following big projects as direct or indirect dependency for language recognition. You're gonna be in a great company using Whatlang:

  • Sonic - fast, lightweight and schema-less search backend in Rust.
  • Meilisearch - an open-source, easy-to-use, blazingly fast, and hyper-relevant search engine built in Rust.

Feature toggles

Feature Description
enum-map Lang and Script implement Enum trait from enum-map
arbitrary Support Arbitrary
dev Enables whatlang::dev module which provides some internal API.
It exists for profiling purposes and normal users are discouraged to to rely on this API.

How does it work?

How does the language recognition work?

The algorithm is based on the trigram language models, which is a particular case of n-grams. To understand the idea, please check the original whitepaper Cavnar and Trenkle '94: N-Gram-Based Text Categorization'.

How is is_reliable calculated?

It is based on the following factors:

  • How many unique trigrams are in the given text
  • How big is the difference between the first and the second(not returned) detected languages? This metric is called rate in the code base.

Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas. This function is a hyperbola and it looks like the following one:

Language recognition whatlang rust

For more details, please check a blog article Introduction to Rust Whatlang Library and Natural Language Identification Algorithms.

Make tasks

  • make bench - run performance benchmarks
  • make doc - generate and open doc
  • make test - run tests
  • make watch - watch changes and run tests

Comparison with alternatives

Whatlang CLD2 CLD3
Implementation language Rust C++ C++
Languages 68 83 107
Algorithm trigrams quadgrams neural network
Supported Encoding UTF-8 UTF-8 ?
HTML support no yes ?

Ports and clones

Donations

You can support the project by donating NEAR tokens.

Our NEAR wallet address is whatlang.near

Derivation

Whatlang is a derivative work from Franc (JavaScript, MIT) by Titus Wormer.

License

MIT Β© Sergey Potapov

Contributors

More Repositories

1

nutype

Rust newtype with guarantees πŸ‡ΊπŸ‡¦ πŸ¦€
Rust
1,068
star
2

ta-rs

Technical analysis library for Rust language
Rust
584
star
3

vim-preview

Vim plugin for previewing markup files(markdown,rdoc,textile,html)
Vim Script
209
star
4

envconfig-rs

Build a config structure from environment variables in Rust without boilerplate
Rust
186
star
5

blogo

Mountable blog engine for Ruby on Rails
Ruby
101
star
6

kinded

Generate Rust enum variants without associated data
Rust
81
star
7

cargo-testify

Watches changes in a rust project, runs test and shows friendly notification
Rust
80
star
8

awesome-programming-books

List of good programming books for beginners and professionals
80
star
9

mago

Magic numbers detector for Ruby source code
Ruby
58
star
10

humble-investing

List of resources that I use for investing research
41
star
11

jsonpath-rs

JSONPath for Rust
Rust
37
star
12

from-typescript-to-rescript

Frontend of https://Inhyped.com written in TypeScript and rewritten in ReScript
TypeScript
29
star
13

telebot

Ruby client for Telegram bot API
Ruby
28
star
14

fast_seeder

Speed up seeding your Rails application using multiple SQL inserts!
Ruby
25
star
15

xplan

Visualizes dependencies between tasks
Rust
22
star
16

dm-rspec

RSpec matchers for DataMapper
Ruby
19
star
17

hellcheck

HTTP health checker implemented in Rust
Rust
19
star
18

vim-esperanto

Vim plugin for typing Esperanto language in any way (Esperanto keyboard, h, x, ^)
Vim Script
15
star
19

crystal-google_translate

Google Translate client for Crystal
Crystal
14
star
20

hail

HTTP load testing tool powered by Rust
Rust
14
star
21

crystal-magma

Crystal interpreter
Crystal
13
star
22

poloniex-rs

Rust client for Poloniex API
Rust
9
star
23

whatlang-ffi

C bindings for whatlang Rust library
C
9
star
24

rustcast

Code for RustCast screencast episodes (https://www.youtube.com/channel/UCZSy_LFJOtOPPcsE64KxDkw)
Rust
8
star
25

crystal-aitk

Artificial Intelligence Tool Kit for Crystal lang
Crystal
8
star
26

tokipona

Ruby library to process constructed language Toki Pona
Ruby
8
star
27

rails3_pg_deferred_constraints

Rails 3 engine which provides a hack to avoid RI_ConstraintTrigger Error bug.
Ruby
7
star
28

dotvim

My .vim
Vim Script
4
star
29

conway-rs

Conway's Game of Life implemented in Rust.
Rust
4
star
30

arbitrary_ext

Provides a way to derive Arbitrary trait but set custom implementation for single fields if necessary.
Rust
4
star
31

crystal-glosbe

Crystal Client for Glosbe API
Crystal
3
star
32

rails_markdown

Allows you to use markdown templates with placeholders in rails application}
Ruby
3
star
33

vim-smartdict

Vim plugin to use translate words (dictionary).
Vim Script
3
star
34

dot-nvim

My nvim config
Vim Script
2
star
35

crystal-cossack

Simple and flexible HTTP client for Crystal with middleware and test support.
2
star
36

enum_param-rs

Rust
2
star
37

beep-alarm

Alarm written in bash and based on beep tool
2
star
38

crystal-delemma

Lemmatization tool for German language.implemented in Crystal
Crystal
2
star
39

crystal-telegram_bot

Crystal
2
star
40

whatlang-website

Website for whatlang (whatlang.org)
JavaScript
2
star
41

crystal-icr

Interactive console for Crystal programming language
2
star
42

envconfig-rs-old

An easy way to build a config structure form environment variables in Rust without boilerplate.
Rust
1
star
43

greyblake.github.com

My blog
HTML
1
star
44

talks

Public talks / presentations
Vue
1
star
45

alis

Tool to create more flexible aliases.
Ruby
1
star
46

rustcast-travis-demo

Rust
1
star
47

gync

Synchronize data of desktop applications with Git
Ruby
1
star
48

dm-enum

Enumerated models for DataMapper
Ruby
1
star
49

deutscher_bot

Telegram Bot that helps to learn German, implemented in Crystal
Crystal
1
star
50

crystal-jwt

JWT implementation in Crystal
1
star