• Stars
    star
    3,906
  • Rank 11,206 (Top 0.3 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created over 10 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Natural language detection

franc

Build Status Coverage Status

Detect the language of text.

What’s so cool about franc?

  1. franc can support more languages(†) than any other library
  2. franc is packaged with support for 82, 187, or 414 languages
  3. franc has a CLI

† - Based on the UDHR, the most translated copyright-free document in the world.

What’s not so cool about franc?

franc supports many languages, which means it’s easily confused on small samples. Make sure to pass it big documents to get reliable results.

Install

👉 Note: this installs the franc package, with support for 187 languages (languages which have 1 million or more speakers). franc-min (82 languages, 8m or more speakers) and franc-all (all 414 possible languages) are also available. Finally, use franc-cli to install the CLI.

This package is ESM only. In Node.js (version 14.14+, 16.0+), install with npm:

npm install franc

In Deno with esm.sh:

import {franc, francAll} from 'https://esm.sh/franc@6'

In browsers with esm.sh:

<script type="module">
  import {franc, francAll} from 'https://esm.sh/franc@6?bundle'
</script>

Use

import {franc, francAll} from 'franc'

franc('Alle menslike wesens word vry') //=> 'afr'
franc('এটি একটি ভাষা একক IBM স্ক্রিপ্ট') //=> 'ben'
franc('Alle menneske er fødde til fridom') //=> 'nno'

franc('') //=> 'und' (language code that stands for undetermined)

// You can change what’s too short (default: 10):
franc('the') //=> 'und'
franc('the', {minLength: 3}) //=> 'sco'

console.log(francAll('Considerando ser essencial que os direitos humanos'))
//=> [['por', 1], ['glg', 0.771284519307895], ['spa', 0.6034146900423971], …123 more items]

console.log(francAll('Considerando ser essencial que os direitos humanos', {only: ['por', 'spa']}))
//=> [['por', 1 ], ['spa', 0.6034146900423971]]

console.log(francAll('Considerando ser essencial que os direitos humanos', {ignore: ['spa', 'glg']}))
//=> [['por', 1], ['cat', 0.5367251059928957], ['src', 0.47461899851037015], …121 more items]

API

This package exports the identifiers franc, francAll. There is no default export.

franc(value[, options])

Get the most probable language for the given value.

Parameters
  • value (string) — value to test
  • options (Options, optional) — configuration
Returns

The most probable language (string).

francAll(value[, options])

Get the most probable language for the given value.

Parameters
  • value (string) — value to test
  • options (Options, optional) — configuration
Returns

Array containing language—distance tuples (Array<[string, number]>).

Options

Configuration (Object, optional) with the following fields:

options.only

Languages to allow (Array<string>, optional).

options.ignore

Languages to ignore (Array<string>, optional).

options.minLength

Minimum length to accept (number, default: 10).

CLI

Install:

npm install franc-cli --global

Use:

CLI to detect the language of text

Usage: franc [options] <string>

Options:

  -h, --help                    output usage information
  -v, --version                 output version number
  -m, --min-length <number>     minimum length to accept
  -o, --only <string>           allow languages
  -i, --ignore <string>         disallow languages
  -a, --all                     display all guesses

Usage:

# output language
$ franc "Alle menslike wesens word vry"
# afr

# output language from stdin (expects utf8)
$ echo "এটি একটি ভাষা একক IBM স্ক্রিপ্ট" | franc
# ben

# ignore certain languages
$ franc --ignore por,glg "O Brasil caiu 26 posições"
# src

# output language from stdin with only
$ echo "Alle mennesker er født frie og" | franc --only nob,dan
# nob

Data

Supported languages
Package Languages Speakers
franc-min 82 8M or more
franc 187 1M or more
franc-all 414 -
Language code

👉 Note: franc returns ISO 639-3 codes (three letter codes). Not ISO 639-1 or ISO 639-2. See also GH-10 and GH-30.

To get more info about the languages represented by ISO 639-3, use iso-639-3. There is also an index available to map ISO 639-3 to ISO 639-1 codes, iso-639-3/to-1.json, but note that not all 639-3 codes can be represented in 639-1.

Types

These packages are fully typed with TypeScript. They export the additional types TrigramTuple and Options.

Compatibility

These package are at least compatible with all maintained versions of Node.js. As of now, that is Node.js 14.14+ and 16.0+. They also works in Deno and modern browsers.

Ports

Franc has been ported to several other programming languages.

The works franc is derived from have themselves also been ported to other languages.

Derivation

Franc is a derivative work from guess-language (Python, LGPL), guesslanguage (C++, LGPL), and Language::Guess (Perl, GPL). Their creators granted me the rights to distribute franc under the MIT license: respectively, Kent S. Johnson, Jacob R. Rideout, and Maciej Ceglowski.

Contribute

Yes please! See How to Contribute to Open Source.

Security

This package is safe.

License

MIT © Titus Wormer

More Repositories

1

dictionaries

Hunspell dictionaries in UTF-8
JavaScript
1,051
star
2

markdown-rs

CommonMark compliant markdown parser in Rust with ASTs and extensions
Rust
906
star
3

starry-night

Syntax highlighting, like GitHub
JavaScript
614
star
4

xdm

Just a *really* good MDX compiler. No runtime. With esbuild, Rollup, and webpack plugins
JavaScript
589
star
5

lowlight

Virtual syntax highlighting for virtual DOMs and non-HTML things
JavaScript
553
star
6

refractor

Lightweight, robust, elegant virtual syntax highlighting using Prism
JavaScript
535
star
7

mdxjs-rs

Compile MDX to JavaScript in Rust
Rust
416
star
8

nspell

📝 Hunspell compatible spell-checker
JavaScript
266
star
9

markdown-table

Generate a markdown (GFM) table
JavaScript
249
star
10

gemoji

Info on gemoji (GitHub Emoji)
JavaScript
218
star
11

write-music

visualise sentence length
JavaScript
192
star
12

readability

visualise readability
JavaScript
185
star
13

parse-english

English (natural language) parser
JavaScript
159
star
14

server-components-mdx-demo

React server components + MDX
JavaScript
123
star
15

emphasize

ANSI syntax highlighting for the terminal
JavaScript
101
star
16

linked-list

Minimalistic linked lists
JavaScript
81
star
17

levenshtein.c

Levenshtein algorithm in C
C
79
star
18

import-meta-resolve

Resolve things like Node.js — ponyfill for `import.meta.resolve`
JavaScript
78
star
19

short-words

visualise lengthy words
JavaScript
65
star
20

trough

`trough` is middleware
JavaScript
61
star
21

bcp-47

Parse and stringify BCP 47 language tags
JavaScript
59
star
22

html-tag-names

List of known HTML tag names
JavaScript
58
star
23

parse-latin

Latin-script (natural language) parser
JavaScript
57
star
24

iso-3166

ISO 3166 (standard for country codes and codes for their subdivisions)
JavaScript
51
star
25

html-element-attributes

Map of HTML elements to allowed attributes
JavaScript
51
star
26

trim-lines

Remove spaces and tabs around line-breaks
JavaScript
50
star
27

common-words

visualise rare words
JavaScript
49
star
28

iso-639-3

Info on ISO 639-3
JavaScript
46
star
29

parse-entities

Parse HTML character references
JavaScript
46
star
30

levenshtein-rs

Levenshtein algorithm in Rust
Rust
42
star
31

emoticon

List of emoticons
JavaScript
40
star
32

direction

Detect directionality: left-to-right, right-to-left, or neutral
JavaScript
39
star
33

textom

DEPRECATED in favour of retext’s virtual object model
39
star
34

dictionary

Dictionary app that can work without JavaScript or internet
JavaScript
37
star
35

f-ck

🤬 Clean-up cuss words
JavaScript
37
star
36

dioscuri

A gemtext (`text/gemini`) parser with support for streaming, ASTs, and CSTs
JavaScript
34
star
37

property-information

Info on the properties and attributes of the web platform
JavaScript
33
star
38

stmr.c

Porter Stemmer algorithm in C
C
32
star
39

eslint-md

Deprecated
30
star
40

svg-tag-names

List of known SVG tag names
JavaScript
29
star
41

checkmoji

Check emoji across platforms
JavaScript
26
star
42

html-void-elements

List of known void HTML elements
JavaScript
26
star
43

npm-high-impact

The high-impact (popular) packages of npm
JavaScript
26
star
44

iso-639-2

Info on ISO 639-2
JavaScript
23
star
45

aria-attributes

List of ARIA attributes
JavaScript
21
star
46

stringify-entities

Serialize (encode) HTML character references
JavaScript
21
star
47

bcp-47-match

Match BCP 47 language tags with language ranges per RFC 4647
JavaScript
19
star
48

speakers

Speaker count for 450+ languages
JavaScript
19
star
49

svg-element-attributes

Map of SVG elements to allowed attributes
JavaScript
19
star
50

osx-learn

Add words to the OS X Spell Check dictionary
Shell
18
star
51

trigrams

Trigram files for 400+ languages
JavaScript
18
star
52

fault

Functional errors with formatted output
JavaScript
17
star
53

remark-preset-wooorm

Personal markdown (and prose) style
JavaScript
17
star
54

udhr

Universal declaration of human rights
HTML
17
star
55

bcp-47-normalize

Normalize, canonicalize, and format BCP 47 tags
JavaScript
16
star
56

happy-places

Little list of happy places
15
star
57

wooorm.github.io

🐛 personal website
JavaScript
14
star
58

plain-text-data-to-json

Transform a simple plain-text database to JSON
JavaScript
14
star
59

parse-dutch

Dutch (natural language) parser
JavaScript
14
star
60

zwitch

Handle values based on a property
JavaScript
13
star
61

match-casing

Match the case of `value` to that of `base`
JavaScript
13
star
62

link-rel

List of valid values for `rel` on `<link>`
JavaScript
13
star
63

npm-esm-vs-cjs

Data on the share of ESM vs CJS on the public npm registry
JavaScript
13
star
64

linter-remark

Check markdown with remark in atom
13
star
65

is-badge

Check if `url` is a badge
JavaScript
13
star
66

vendors

List of vendor prefixes known to the web platform
JavaScript
12
star
67

load-plugin

Load a submodule / plugin
JavaScript
12
star
68

comma-separated-tokens

Parse and stringify comma-separated tokens
JavaScript
11
star
69

bail

Throw if given an error
JavaScript
11
star
70

space-separated-tokens

Parse and stringify space-separated tokens
JavaScript
10
star
71

trigram-utils

A few language trigram utilities
JavaScript
10
star
72

retext-language

Detect then language of text with Retext
JavaScript
9
star
73

collapse-white-space

Collapse white space.
JavaScript
9
star
74

unherit

Clone a constructor without affecting the super-class
JavaScript
9
star
75

longest-streak

Count the longest repeating streak of a substring
JavaScript
9
star
76

markdown-escapes

Legacy: list of escapable characters in markdown
JavaScript
9
star
77

state-toggle

Enter/exit a state
JavaScript
9
star
78

meta-name

List of values that can be used as `name`s on HTML `meta` elements
JavaScript
9
star
79

html-dangerous-encodings

List of dangerous HTML character encoding labels
JavaScript
8
star
80

character-entities

Map of named character references.
JavaScript
8
star
81

levenshtein

Levenshtein algorithm CLI
Shell
8
star
82

stmr

Porter Stemmer CLI
C
8
star
83

commonmark.json

CommonMark test spec in JSON
JavaScript
8
star
84

web-namespaces

Map of web namespaces
JavaScript
7
star
85

is-whitespace-character

Check if a character is a white space character
JavaScript
7
star
86

strip-skin-tone

Strip skin tone modifiers (as in Fitzpatrick scale) from emoji (🎅🏿 to 🎅)
JavaScript
7
star
87

svg-event-attributes

List of SVG event handler attributes
JavaScript
7
star
88

atom-travis

Install Atom on Travis
Shell
7
star
89

control-pictures

Replace pictures for control character codes with actual control characters
JavaScript
7
star
90

osx-shortcut

Add autocorrect text shortcuts to OS X
Shell
6
star
91

css-declarations

Legacy utility to parse and stringify CSS declarations
JavaScript
6
star
92

html-event-attributes

List of HTML event handler attributes
JavaScript
6
star
93

html-encodings

Info on HTML character encodings.
JavaScript
6
star
94

mathml-tag-names

List of known MathML tag names
JavaScript
6
star
95

array-iterate

`Array#forEach()` but it’s possible to define where to move to next
JavaScript
6
star
96

remark-range

Deprecated
6
star
97

atom-tap-test-runner

Run Atom package tests using TAP
6
star
98

ccount

Count how often a substring occurs
JavaScript
6
star
99

doctype

Info on HTML / XHTML / MathML / SVG doctypes
JavaScript
6
star
100

labels

GitHub labels
6
star