• Stars
    star
    190
  • Rank 196,530 (Top 4 %)
  • Language
    Go
  • License
    MIT License
  • Created almost 11 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A native Go clean room implementation of the Porter Stemming algorithm.

Go Porter Stemmer

A native Go clean room implementation of the Porter Stemming Algorithm.

This algorithm is of interest to people doing Machine Learning or Natural Language Processing (NLP).

This is NOT a port. This is a native Go implementation from the human-readable description of the algorithm.

I've tried to make it (more) efficient by NOT internally using string's, but instead internally using []rune's and using the same (array) buffer used by the []rune slice (and sub-slices) at all steps of the algorithm.

For Porter Stemmer algorithm, see:

http://tartarus.org/martin/PorterStemmer/def.txt (URL #1)

http://tartarus.org/martin/PorterStemmer/ (URL #2)

Departures

Also, since when I initially implemented it, it failed the tests at...

http://tartarus.org/martin/PorterStemmer/voc.txt (URL #3)

http://tartarus.org/martin/PorterStemmer/output.txt (URL #4)

... after reading the human-readble text over and over again to try to figure out what the error I made was (and doing all sorts of things to debug it) I came to the conclusion that the some of these tests were wrong according to the human-readable description of the algorithm.

This led me to wonder if maybe other people's code that was passing these tests had rules that were not in the human-readable description. Which led me to look at the source code here...

http://tartarus.org/martin/PorterStemmer/c.txt (URL #5)

... When I looked there I noticed that there are some items marked as a "DEPARTURE", which differ from the original algorithm. (There are 2 of these.)

I implemented these departures, and the tests at URL #3 and URL #4 all passed.

Usage

To use this Golang library, use with something like:

package main

import (
  "fmt"
  "github.com/reiver/go-porterstemmer"
)

func main() {
  
  word := "Waxes"
  
  stem := porterstemmer.StemString(word)
  
  fmt.Printf("The word [%s] has the stem [%s].\n", word, stem)
}

Alternatively, if you want to be a bit more efficient, use []rune slices instead, with code like:

package main

import (
  "fmt"
  "github.com/reiver/go-porterstemmer"
)

func main() {
  
  word := []rune("Waxes")
  
  stem := porterstemmer.Stem(word)
  
  fmt.Printf("The word [%s] has the stem [%s].\n", string(word), string(stem))
}

Although NOTE that the above code may modify original slice (named "word" in the example) as a side effect, for efficiency reasons. And that the slice named "stem" in the example above may be a sub-slice of the slice named "word".

Also alternatively, if you already know that your word is already lowercase (and you don't need this library to lowercase your word for you) you can instead use code like:

package main

import (
  "fmt"
  "github.com/reiver/go-porterstemmer"
)

func main() {
  
  word := []rune("waxes")
  
  stem := porterstemmer.StemWithoutLowerCasing(word)
  
  fmt.Printf("The word [%s] has the stem [%s].\n", string(word), string(stem))
}

Again NOTE (like with the previous example) that the above code may modify original slice (named "word" in the example) as a side effect, for efficiency reasons. And that the slice named "stem" in the example above may be a sub-slice of the slice named "word".

More Repositories

1

go-telnet

Package telnet provides TELNET and TELNETS client and server implementations, for the Go programming language, in a style similar to the "net/http" library that is part of the Go standard library, including support for "middleware"; TELNETS is secure TELNET, with the TELNET protocol over a secured TLS (or SSL) connection.
Go
235
star
2

greatape

Social audio & video app
Go
111
star
3

blockchain-reading-list

A reading list on blockchain and related technologies, targeted at technical people who want a deep understanding of those topics.
104
star
4

technology-executive-reading-list

A reading list for technology executives. Those that may find this useful include: chief technology officers (CTO), chief science officers (CSO), chief information officers (CIO), senior vice presidents (SVP) of technology, senior vice presidents (SVP) of engineering, senior vice presidents (SVP) of data science, vice presidents (VP) of technology, vice presidents (VP) of engineering, vice presidents (VP) of data science, directors of technology, directors of engineering, directors of data science, technical directors (TD), engineering managers, staff software engineers, staff data scientists, lead software engineers, lead data scientsts, etc.
29
star
5

go-v4l2

Package v4l2 exposes the V4L2 (Video4Linux version 2) API to Golang
Go
27
star
6

golang-backend-training

A guide to help train someone with backend development using the Go programming language.
11
star
7

learn-golang-in-one-day

Learn Golang In One Day.
9
star
8

go-mjrty

Go library that implements the MJRTY algorithm for finding the majority in a sequence in a single pass, in O(n) time and O(1) space -- in linear time complexity and constant space complexity.
Go
9
star
9

go-shunt

Package shunt enables you to create "middleware" for the "database/sql" package.
Go
9
star
10

go-webui

Package webui enables an application written in the Go programming language (i.e., Golang) to create a user interface (UI) using Web technologies, such HTML, CSS, JavaScript, WebAssembly, WebRTC, etc etc etc.
Go
8
star
11

go-oi

Package oi provides useful tools to be used with the Go programming language's standard "io" package.
Go
7
star
12

go-hashuri

Parses Hash URIs, for the Go programming language.
Go
6
star
13

telnets

User interface to the TELNETS protocol. TELNETS is the secure version of the TELNET protocol. TELNETS is the TELNET protocol over a secure TLS (or SSL) connection. (Also note that TELNETS and SSH are not the same thing.)
Go
6
star
14

go-stringcase

A Go library that makes it so you can convert strings to different casing styles … lower case, UPPER CASE, Title Case, camelCase, PascalCase, snake_case, CONST_CASE, property-case, Header-Case
Go
6
star
15

go-php

A helper library for Golang for those coming from PHP to Go. Implements functions and classes familiar to those with PHP.
Go
6
star
16

fediverse-0

A list of the people who created and continue to develop the Fediverse technology.
5
star
17

evolution-and-behavior-reading-list

A list of what was read and discussed, over the years, for the Evolution And Behavior reading group.
5
star
18

go-cast

Package cast provides tools for safely converting from one type to another, without information loss.
Go
5
star
19

go-iter

Package iter provides tools for creating iterators, for the Go programming language.
Go
5
star
20

go-cellularautomata

A simple Go library for creating Cellular Automata.
4
star
21

go-pathmatch

A library that provides pattern matching for paths for the Go programming language; these coud be paths from a files system, or a path from a URL (such as an HTTP or HTTPS based URL).
Go
3
star
22

go-hg

Package hg provides ☿ Mercury Protocol client and server implementations, for the Go programming language.
Go
3
star
23

reiver

A Capsule for the Gemini Protocol
3
star
24

go-buffers

Package buffers provides tools for working with byte array, and byte slice buffers, for the Go programming language.
Go
3
star
25

go-whitespace

A small library for dealing with whitespace is Go.
Go
3
star
26

littleape

Frontend for a social audio & video app
TypeScript
3
star
27

go-digestfs

Package digestfs provides a content-addressable virtual file system (VFS) by providing a common interface to one or more content-addressable storage (CAS).
Go
3
star
28

awesome-finger

A collection of awesome things regarding the finger protocol ecosystem.
3
star
29

go-evmop

Package evmop provides tools for turning Ethereum Virtual Machine (EVM) OpCodes into bytecodes, for the Go programming language. This might be useful to someone writing an AOT or JIT compiler targeting the Ethereum Virtual Machine (EVM).
Go
3
star
30

go-fourcc

Go implementation of FOURCC (four character code) (4CC) identifiers for a video codecs, compression formats, colors and pixel format used in media files.
Go
2
star
31

personality-lexical-hypothesis

Research on The Lexical Hypothesis of human personality.
2
star
32

web-api-guide

A guide on how-to create a Web (i.e., HTTP or HTTPS) based API.
2
star
33

fingerverse

A list of many finger-sites (finger-holes?)
2
star
34

go-textnormalize

TextNormalize is a Golang library that provides ways of normalizing text.
Go
2
star
35

go-watchdog

A very experimental supervisor tree library for Go.
Go
2
star
36

go-tmpl

Package tmpl provides templating capabilities, for the Go programming language.
Go
2
star
37

lifeterm

A simple implementation of Conway's Game of Life in Go, that runs in the terminal.
Go
2
star
38

vagrant-golang

Vagrant environment for Golang, with support for both go and gb commands.
Shell
2
star
39

dummy-statsd

A Dummy StatsD server, useful for inspecting raw StatsD messages coming in, as they are coming in.
Go
2
star
40

fediverse-archetypes2

An analysis of the type of people seen on the Fediverse.
2
star
41

go-opt

Package opt implements an optional-type, for the Go programming language. In other programming languages, an optional-type might be know as: a option type, or a maybe type.
Go
2
star
42

go-dataurl

A library that provides tools to work with data URLs, as defined by RFC 2397, for the Go programming language.
Go
2
star
43

go-x11

Package x11 provides X11 client implementation, for the Go programming language, in a style similar to the "net/http" library that is part of the Go standard library, including support for "middleware".
Go
2
star
44

securewebguide

A programmer's guide, set of tutorials, and comprehensive reference to the Secure Web; recommended for experienced programmers who want to learn how to create applications on the Secure Web, or to create an implementation of the Secure Web.
2
star
45

open-banking-guidebook

The Open Banking Guidebook is written for those looking to bring Open Banking to their Bank, Trust, Credit Union, Money Service Business (MSB), FinTech company, or any other finance related organization. It is a step-by-step guide that explains the Who, What, When, Where, Why, and How of Open Banking. And provides a checklist.
2
star
46

go-utf8

Package utf8 implements encoding and decoding of UTF-8, for the Go programming language. This package is meant to be a replacement for Go's built-in "unicode/utf8" package.
Go
2
star
47

cyber80

The cyber80 is a fantasy console computer geared towards video games that is inspired by arcade game machines, home video game consoles, handheld game consoles, and (other) computers from the 1980s.
Go
2
star
48

go-cli

Package cli provides a way to creating command line interface (CLI) programs, for the Go programming language, in a style similar to the "net/http" library that is part of the Go standard library, including support for "middleware".
Go
2
star
49

go-simplehttp

A library that provides a simple way of sending an HTTP (and HTTPS) response.
Go
2
star
50

go-container

A simple dependency injection library for the Go programming language.
Go
1
star
51

go-xim

Package xim provides a quazi‐ monotonically‐increasing unique‐identifiers.
Go
1
star
52

go-ascii

A library that provides tools for working with ASCII characters, for the Go programming language.
Go
1
star
53

go-bravo16

Package bravo16 provides a more (human) safe base-16 binary-to-text encoding, and decoding; brave16 is an alternative to hexadecimal.
Go
1
star
54

go-eventchain

Package eventchain provides basic building blocks for doing Event Sourcing and Log-Driven Development, for the Go programming language.
Go
1
star
55

go-database

Package database provides alternative API for dealing with databases in the Go programming language other than the built-in `"database/sql"` package. Package database makes heavy use of (what other programming languages call) option types.
Go
1
star
56

diy-face-mask

Information about DIY protective face mask respirators.
1
star
57

guide-uuid

A guide in UUID.
1
star
58

diy-practical-effects

In this text you will learn how to create Practical Effects yourself.
1
star
59

guide-influencer

A guide on being an influencer, and influencing. This relates to — journalism, reporting, and marketing.
1
star
60

go-c80

Package c80 provides a graphics library for the Go programming language.
Go
1
star
61

tell-a-story

An essay on stories and narratives.
1
star
62

go-fbdev

Tools for working with the Frame Buffer Device (fbdev) (that is common on Linux based on operating systems), for the Go programming language.
Go
1
star
63

probshell

A simple probability calculator with a command-line interface (CLI).
1
star
64

gogen-optiontype

Option types (also known as maybe types) for Go, via "go generate".
Go
1
star
65

go-money

Go library for deal with money in a type safe way, including parsing from strings.
Go
1
star
66

go-finger

Pacakge finger implements the finger protocol, for the Go programming language.
Go
1
star
67

go-tabio

Package tabio provides tools for dealing with tabular I/O.
Go
1
star
68

guide-logging

A guide on logging. An in particular how to create a logger.
1
star
69

go-denary64

Package denary64 provides base-10 floating point numbers, which are safe to use to store money values, and are safe to do math calculations with; as opposed to the built-in Golang types float32, and float64 which are base-2 floating point number types (rather than base-10) and which are NOT safe to use for money.
Go
1
star
70

guide-dev-to-qa-hand-over

A guide on what to do and what to not do when software developers hand-over to QA.
1
star
71

tgfs-init

Create an empty local instance of The Great File System (TGFS).
Go
1
star
72

go-refs

refs provides a generic Virtual File System (VFS) like abstraction, for the Go programming language.
Go
1
star
73

learndatascience-www

Website for the 'Data Science LEARNING Group' ( http://www.meetup.com/LearnDataScience ).
1
star
74

gemini-comparison

1
star
75

go-strfs

strfs provides a virtual file-system, where a fs.File can be created from a Go string.
Go
1
star
76

go-loggers

Package loggers provides useful tools for dealing with loggers, for the Go programming language.
Go
1
star
77

go-indent

Package indent provides tools for indentation of UTF-8 text, for the Go programming language.
Go
1
star
78

go-modhandler

A library that provides a ("middleware") HTTP handler to deal with conditional GETs by sending out a "Last-Modified" HTTP response header, and properly dealing with a "If-Modified-Since" HTTP request header, for the Go programming language.
Go
1
star
79

docker-gopherjs

A Docker based development environment for doing frontend development with Golang using GopherJS.
Shell
1
star
80

fediverse-icons

Free icons for Fediverse
1
star
81

go-manyerrors

A Go library that provides a error type that can contains a list of errors.
Go
1
star
82

html-include

A web component for doing HTML includes: <html-include src="..."></html-include>
JavaScript
1
star
83

go-errhttp

Package errhttp provides types errors that make dealing with HTTP response errors easier, for the Go programming language.
Go
1
star
84

SkinnyMVC

SkinnyMVC is a light-weight, easy to learn, "skinny" development framework for PHP that enables the developer to implement the MVC architectural pattern, while maintaining maximum flexibility and performance of the application.
PHP
1
star
85

golang-guide

A guide for programming in Golang.
1
star
86

go-epsilongreedy

A Golang library that implements the epsilon-Greedy bandit algorithm.
1
star
87

go-streng

Package streng provides a string option type, result type, and nullable type, for the Go programming language.
Go
1
star
88

finger

finger is a modern finger-protocol client.
Go
1
star
89

go-httpprd

Package httpprd provides tools for HTTP products, for the Go programming language.
Go
1
star
90

go-font8x8

Package font8x8 provides a set of 8×8 fonts for Unicode characters.
Go
1
star
91

spacemonkey-gui

SpaceMonkey GUI
Go
1
star
92

go-ui

Package ui provides user interface (UI) client and server implementations, for the Go programming language.
Go
1
star
93

go-mdl

Package mdl provides tools for doing Event Modeling in the Go programming language. One might use package mdl as part of a system that does Event Sourcing, and CQRS.
Go
1
star
94

go-numeric

A library that provides helper functions to deal with runes that represent numeric values, for the Go programming language.
Go
1
star
95

go-urires

Package urires — /uri-res/ — provides a trivial convention for using HTTP in URN resolution (as per RFC-2169), which is useful for creating a content-addressable web, which nowadays would be considered useful, and even necessary for a distributed web, or decentralized web.
1
star
96

go-parcels

A library that provides a convenient immutable way of passing around a piece of data, with easy ways of getting that data as []byte, io.Reader, []rune, or string types, for the Go programming language.
Go
1
star
97

go-pqerror

A helper library that provides constants for the Postgres Error Codes, to be used with the Golang Postgres driver https://godoc.org/github.com/lib/pq
Go
1
star
98

PHP-CoreErlang

PHP DSL (domain specific language) used for generating Core Erlang .core files, so can write code that targets the Erlang VM from PHP.
PHP
1
star
99

go-conv

Tools for converting text into Go data types, such as converting a string to an integer, converting a string to a boolean, etc.
Go
1
star
100

vnmd-example

An example Visual Novel in Visual Novel Markdown (VNMD) format. This is useful for those who want to write parsers for VNMD files.
1
star