• Stars
    star
    122
  • Rank 290,332 (Top 6 %)
  • Language
    Python
  • License
    Other
  • Created over 11 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Snowball stemming library collection for Python

Snowball stemming library collection for Python

This document pertains to the Python version of the stemmer library distribution, available for download from:

Original program is maintained at following place:

Original Snowball product created by Dr Martin Porter and Richard Boulton (Java porting). Original Snowball and my products are released under BSD license.

How to use library

The snowballstemmer module has two functions.

The snowballstemmer.algorithms function returns a list of avilable algorithm name' string.

The snowballstemmer.stemmer function accepts algorithm name and returns Stemmer objects.

Stemmer objects have Stemmer.stemWord(word) method and Stemmer.stemWords(word[]) method.

import snowballstemmer

stemmer = snowballstemmer.stemmer('english');
print(stemmer.stemWords("We are the world".split()));

Stemmer objects have Stemmer.maxCacheSize property. They cache result within the value. Default is 10000.

Accerarates Stemming

if PyStemmer is installed, snowballstemmer.stemmer returns PyStemmer's Stemmer objects. This Stemmer object has same methods (Stemmer.stemWord(), Stemmer.stemWords()).

PyStemmer is a Snowball's libstemmer_c wrapper module and it returns 100% compatible result with snowballstemmer.

PyStemmer has faster speed because it uses C-lang module, and snowballstemmer has higher usability because it is pure Python module.

Benchmark

Test Case: Snowball stemmer check data (16 algorithms, total 582560 words, cache hit 0%) Computer: MacBook Pro 3rd Gen Corei7 2.3GHz

  • Python 2.7 + snowballstemmer : 2m 30s
  • PyPy 1.9 + snowballstemmer : 45s
  • Python 2.7 + PyStemmer : 5s

This test case is much harder than usual usecases!

The TestApp example

The testapp.py example program allows you to run any of the stemmers on a sample vocabulary.

Usage:

testapp.py <algorithm> "sentences ... "
$ python testapp.py English "sentences... "

Thanks

  • Original Snowball authors
  • Emil Stenstrรถm

License

It is a BSD licensed library.


Copyright (c) 2013, Yoshiki Shibukawa

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

More Repositories

1

imagesize_py

Python
217
star
2

configdir

Multi platform library of configuration directories for Golang
Go
195
star
3

nanogui-go

Go
171
star
4

nanovgo

Go
146
star
5

curl_as_dsl

Go
112
star
6

lightpng

PNG optimization tool for game graphics
C
99
star
7

oktavia

Full text search engine for JS environments
JavaScript
70
star
8

frontend-go

frontend-go helps development Go web server that uses SPA (Single Page Application) style web frontend.
Go
66
star
9

tish

Tiny Shell
Go
64
star
10

gui4go

GUI framework on OpenGL/WebGL for golang
Go
31
star
11

awesome-opencensus

28
star
12

spa-go-1.16

Go 1.16 embed + SPA example
Vue
25
star
13

localsocket

Go
22
star
14

gotomation

Go
19
star
15

uuid62

Short ID generator by using UUIDv4 and base62
Go
19
star
16

md2sql

Go
18
star
17

shadow-fetch

Accelerator for SSR
JavaScript
17
star
18

cdiff

cdiff generates line-by-line diff or word-by-word diff (like github) and formats with color.
Go
16
star
19

snowball-stemmer.jsx

This is a collection of stemmers for JSX/JS/AMD/Common.js.
JavaScript
16
star
20

vg4go

Vector Graphics library for golang
Go
15
star
21

rest-licenses

OSS licenses in reStructuredText format
14
star
22

mithril-ja

HTML
14
star
23

swap

Go
10
star
24

snowball

Stemming library collection. This copy repository will be an extended version to support other programming languages. Original one is on http://snowball.tartarus.org/index.php.
HTML
9
star
25

connect-test

TypeScript
9
star
26

fm-index.jsx

FM-index is the fastest full text search algorithm using a compressed index file. This is FM-index for JSX/JS/AMD/Common.js.
JavaScript
9
star
27

git4go

Go
9
star
28

i18n4v

JavaScript i18n(internationalization) library for virtual DOM
JavaScript
7
star
29

wavelet-matrix.jsx

WaveletMatrix implementation for JS/JSX/AMD/CommonJS
JavaScript
6
star
30

s3-qt

C++
6
star
31

tobubus

General Purpose Plugin System
Go
6
star
32

sphinxcontrib-ssmlbuilder

Sphinx's SSMLBuilder to convert your document into MP3 files
Python
6
star
33

findpackagesrc

Golang package that find original source path
Go
5
star
34

es-pattern-match

ECMAScript AST pattern matching library
JavaScript
5
star
35

alice-in

alice-in -- Alice in VIRTUAL Land This library is for treating virtual environment.
Go
5
star
36

add-func-name

Add temp name to anonymous function for debugging.
JavaScript
5
star
37

oktavia.py

High performance pure Python/JavaScript search engine
Python
5
star
38

xlsx2txt

xlsx dump utility to show diff in git
Go
5
star
39

mockconn

Go
4
star
40

interpreter-in-python

Python
4
star
41

got

git command written in golang
Go
4
star
42

lz4.jsx

LZ4 decompress code for pure JavaScript environment by using altJS JSX
JavaScript
4
star
43

container-dev-sample

TypeScript
4
star
44

brbundle

Go
4
star
45

burrows-wheeler-transform.jsx

Burrows Wheeler Transform with linear scale (O(n)) induced sort algorithm for JS/JSX/AMD/CommonJS
JavaScript
4
star
46

logviewer

local fluentd log viewer by using grep
CSS
3
star
47

typescript-parser.jsx

TypeScript parser for JSX
JavaScript
3
star
48

Stable-JSON

JSON.stringify it guarantees Object's key order.
JavaScript
3
star
49

generator-typescript

TypeScript
3
star
50

tofuonfire

Code of the ๐Ÿ“›, by the ๐Ÿ“›, for the ๐Ÿ“›
Go
3
star
51

jsspellchecker

JavaScript
3
star
52

hogan.jsx

Hogan templating engine for JSX
JavaScript
3
star
53

rfc-viewer

TypeScript
3
star
54

riakcs-helper

Go
3
star
55

xxhash.jsx

JSX implementation of xxhash
JavaScript
3
star
56

jsfl.jsx

JSFL (Flash macro language) wrapper for JSX (type safe, high speed, altJS, not Photoshop macro language).
JavaScript
3
star
57

qt5.jsx

Qt binding for JSX
JavaScript
2
star
58

formatdata-go

Pretty print data
Go
2
star
59

xlsxrange

xlsx package helper to handle range notation
Go
2
star
60

node-ext-tar

Fast tar extract module.
JavaScript
2
star
61

xml.jsx

XML implementation in JSX
JavaScript
2
star
62

xlc

Go
2
star
63

esprima.jsx

JSX wrapper for Esprima that is a high performance, standard-compliant ECMAScript parser.
JavaScript
2
star
64

node-webkit.jsx

node-webkit API wrapper for JSX
JavaScript
2
star
65

query-parser.jsx

Google-ish query string parser for JSX/JS/CommonJS/AMD
JavaScript
2
star
66

qtwidget.js

JavaScript interpreter powered by Qt Widgets
JavaScript
2
star
67

promise-pipeline-for-JSX

It is a JSX implementation of futures and promises pattern: http://en.wikipedia.org/wiki/Futures_and_promises
JavaScript
2
star
68

shell

Shell-ish string splitter. It supports quoted string with space.
Go
2
star
69

test

1
star
70

shuffle-name

ๅ‹‰ๅผทไผšใฎ็™บ่กจ้ †ใ‚’ๆฑบใ‚ใ‚‹
Svelte
1
star
71

chromeapps.jsx

JSX wrapper for Chrome Apps
JavaScript
1
star
72

webworker.jsx

WebWorker API wrapper for JSX
JavaScript
1
star
73

opengraph

Helper package to write Open Graph Protocol and JSON-LD
Go
1
star
74

qtpm

Go
1
star
75

snowball_jsx

Stemming library collection for JSX
JavaScript
1
star
76

vue.jsx

vue.js wrapper for JSX
JavaScript
1
star
77

redis.jsx

JSX wrapper for node_redis
JavaScript
1
star
78

waveform

C++
1
star
79

ssrproxy

Go
1
star
80

react-cheetah-grid

WIP
1
star
81

sortedslices

This package provides complementary functions for sorted slices.
Go
1
star
82

timer

timer app powered by ebiten
Go
1
star
83

go-test

Colorized go test runnner
Go
1
star
84

sphinxcontrib-kindlebuilder

Python
1
star
85

cloudcounter

serverless counter
Go
1
star
86

sencha-touch.jsx

Sencha Touch API wrapper for JSX
JavaScript
1
star
87

uuid.jsx

The RFC-compliant UUID generator for JSX
JavaScript
1
star
88

serverless-sample

Vue
1
star
89

escodegen.jsx

escodegen API wrapper for JSX
JavaScript
1
star
90

generics.go

pseudo generics support for golang
1
star
91

getopt.jsx

POSIX getopt JSX version
JavaScript
1
star
92

jsoptimizer

Research project. Creating code optimizer based on JSHint.
JavaScript
1
star
93

opencensus-sample

TypeScript
1
star
94

godot-go-template

Go
1
star
95

bsearch

binary search function for golang
Go
1
star
96

express.jsx

JSX wrapper for express web application server.
JavaScript
1
star
97

fileidentity-go

file identity checking utility.
Go
1
star
98

shutil.jsx

High level file/directory operation methods. It is inspired by Python's [shutil](https://docs.python.org/3.4/library/shutil.html) module.
JavaScript
1
star
99

size

Go
1
star
100

displayname_py

Getting user name for display from operating system
Python
1
star