• Stars
    star
    1,343
  • Rank 34,982 (Top 0.7 %)
  • Language
    Go
  • License
    MIT License
  • Created almost 9 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Correct commonly misspelled English words in source files

Build Status Go Report Card GoDoc Coverage license

Correct commonly misspelled English words... quickly.

Install

If you just want a binary and to start using misspell:

curl -L -o ./install-misspell.sh https://git.io/misspell
sh ./install-misspell.sh

Both will install as ./bin/misspell. You can adjust the download location using the -b flag. File a ticket if you want another platform supported.

If you use Go, the best way to run misspell is by using gometalinter. Otherwise, install misspell the old-fashioned way:

go get -u github.com/client9/misspell/cmd/misspell

and misspell will be in your GOPATH

Also if you like to live dangerously, one could do

curl -L https://git.io/misspell | bash

Usage

$ misspell all.html your.txt important.md files.go
your.txt:42:10 found "langauge" a misspelling of "language"

# ^ file, line, column
$ misspell -help
Usage of misspell:
  -debug
    	Debug matching, very slow
  -error
    	Exit with 2 if misspelling found
  -f string
    	'csv', 'sqlite3' or custom Golang template for output
  -i string
    	ignore the following corrections, comma separated
  -j int
    	Number of workers, 0 = number of CPUs
  -legal
    	Show legal information and exit
  -locale string
    	Correct spellings using locale perferances for US or UK.  Default is to use a neutral variety of English.  Setting locale to US will correct the British spelling of 'colour' to 'color'
  -o string
    	output file or [stderr|stdout|] (default "stdout")
  -q	Do not emit misspelling output
  -source string
    	Source mode: auto=guess, go=golang source, text=plain or markdown-like text (default "auto")
  -w	Overwrite file with corrections (default is just to display)

FAQ

How can I make the corrections automatically?

Just add the -w flag!

$ misspell -w all.html your.txt important.md files.go
your.txt:9:21:corrected "langauge" to "language"

# ^ File is rewritten only if a misspelling is found

How do I convert British spellings to American (or vice-versa)?

Add the -locale US flag!

$ misspell -locale US important.txt
important.txt:10:20 found "colour" a misspelling of "color"

Add the -locale UK flag!

$ echo "My favorite color is blue" | misspell -locale UK
stdin:1:3:found "favorite color" a misspelling of "favourite colour"

Help is appreciated as I'm neither British nor an expert in the English language.

How do you check an entire folder recursively?

Just list a directory you'd like to check

misspell .
misspell aDirectory anotherDirectory aFile

You can also run misspell recursively using the following shell tricks:

misspell directory/**/*

or

find . -type f | xargs misspell

You can select a type of file as well. The following examples selects all .txt files that are not in the vendor directory:

find . -type f -name '*.txt' | grep -v vendor/ | xargs misspell -error

Can I use pipes or stdin for input?

Yes!

Print messages to stderr only:

$ echo "zeebra" | misspell
stdin:1:0:found "zeebra" a misspelling of "zebra"

Print messages to stderr, and corrected text to stdout:

$ echo "zeebra" | misspell -w
stdin:1:0:corrected "zeebra" to "zebra"
zebra

Only print the corrected text to stdout:

$ echo "zeebra" | misspell -w -q
zebra

Are there special rules for golang source files?

Yes! If the file ends in .go, then misspell will only check spelling in comments.

If you want to force a file to be checked as a golang source, use -source=go on the command line. Conversely, you can check a golang source as if it were pure text by using -source=text. You might want to do this since many variable names have misspellings in them!

Can I check only-comments in other other programming languages?

I'm told the using -source=go works well for ruby, javascript, java, c and c++.

It doesn't work well for python and bash.

Does this work with gometalinter?

gometalinter runs multiple golang linters. Starting on 2016-06-12 gometalinter supports misspell natively but it is disabled by default.

# update your copy of gometalinter
go get -u github.com/alecthomas/gometalinter

# install updates and misspell
gometalinter --install --update

To use, just enable misspell

gometalinter --enable misspell ./...

Note that gometalinter only checks golang files, and uses the default options of misspell

You may wish to run this on your plaintext (.txt) and/or markdown files too.

How Can I Get CSV Output?

Using -f csv, the output is standard comma-seprated values with headers in the first row.

misspell -f csv *
file,line,column,typo,corrected
"README.md",9,22,langauge,language
"README.md",47,25,langauge,language

How can I export to SQLite3?

Using -f sqlite, the output is a sqlite3 dump-file.

$ misspell -f sqlite * > /tmp/misspell.sql
$ cat /tmp/misspell.sql

PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE misspell(
  "file" TEXT,
  "line" INTEGER,i
  "column" INTEGER,i
  "typo" TEXT,
  "corrected" TEXT
);
INSERT INTO misspell VALUES("install.txt",202,31,"immediatly","immediately");
# etc...
COMMIT;
$ sqlite3 -init /tmp/misspell.sql :memory: 'select count(*) from misspell'
1

With some tricks you can directly pipe output to sqlite3 by using -init /dev/stdin:

misspell -f sqlite * | sqlite3 -init /dev/stdin -column -cmd '.width 60 15' ':memory' \
    'select substr(file,35),typo,count(*) as count from misspell group by file, typo order by count desc;'

How can I ignore rules?

Using the -i "comma,separated,rules" flag you can specify corrections to ignore.

For example, if you were to run misspell -w -error -source=text against document that contains the string Guy Finkelshteyn Braswell, misspell would change the text to Guy Finkelstheyn Bras well. You can then determine the rules to ignore by reverting the change and running the with the -debug flag. You can then see that the corrections were htey -> they and aswell -> as well. To ignore these two rules, you add -i "htey,aswell" to your command. With debug mode on, you can see it print the corrections, but it will no longer make them.

How can I change the output format?

Using the -f template flag you can pass in a golang text template to format the output.

One can use printf "%q" VALUE to safely quote a value.

The default template is compatible with gometalinter

{{ .Filename }}:{{ .Line }}:{{ .Column }}:corrected {{ printf "%q" .Original }} to "{{ printf "%q" .Corrected }}"

To just print probable misspellings:

-f '{{ .Original }}'

What problem does this solve?

This corrects commonly misspelled English words in computer source code, and other text-based formats (.txt, .md, etc).

It is designed to run quickly so it can be used as a pre-commit hook with minimal burden on the developer.

It does not work with binary formats (e.g. Word, etc).

It is not a complete spell-checking program nor a grammar checker.

What are other misspelling correctors and what's wrong with them?

Some other misspelling correctors:

They all work but had problems that prevented me from using them at scale:

  • slow, all of the above check one misspelling at a time (i.e. linear) using regexps
  • not MIT/Apache2 licensed (or equivalent)
  • have dependencies that don't work for me (python3, bash, linux sed, etc)
  • don't understand American vs. British English and sometimes makes unwelcome "corrections"

That said, they might be perfect for you and many have more features than this project!

How fast is it?

Misspell is easily 100x to 1000x faster than other spelling correctors. You should be able to check and correct 1000 files in under 250ms.

This uses the mighty power of golang's strings.Replacer which is a implementation or variation of the Aho–Corasick algorithm. This makes multiple substring matches simultaneously.

In addition this uses multiple CPU cores to work on multiple files.

What problems does it have?

Unlike the other projects, this doesn't know what a "word" is. There may be more false positives and false negatives due to this. On the other hand, it sometimes catches things others don't.

Either way, please file bugs and we'll fix them!

Since it operates in parallel to make corrections, it can be non-obvious to determine exactly what word was corrected.

It's making mistakes. How can I debug?

Run using -debug flag on the file you want. It should then print what word it is trying to correct. Then file a bug describing the problem. Thanks!

Why is it making mistakes or missing items in golang files?

The matching function is case-sensitive, so variable names that are multiple worlds either in all-upper or all-lower case sometimes can cause false positives. For instance a variable named bodyreader could trigger a false positive since yrea is in the middle that could be corrected to year. Other problems happen if the variable name uses a English contraction that should use an apostrophe. The best way of fixing this is to use the Effective Go naming conventions and use camelCase for variable names. You can check your code using golint

What license is this?

The main code is MIT.

Misspell also makes uses of the Golang standard library and contains a modified version of Golang's strings.Replacer which are covered under a BSD License. Type misspell -legal for more details or see legal.go

Where do the word lists come from?

It started with a word list from Wikipedia. Unfortunately, this list had to be highly edited as many of the words are obsolete or based from mistakes on mechanical typewriters (I'm guessing).

Additional words were added based on actually mistakes seen in the wild (meaning self-generated).

Variations of UK and US spellings are based on many sources including:

American English is more accepting of spelling variations than is British English, so "what is American or not" is subject to opinion. Corrections and help welcome.

What are some other enhancements that could be done?

Here's some ideas for enhancements:

Capitalization of proper nouns could be done (e.g. weekday and month names, country names, language names)

Opinionated US spellings US English has a number of words with alternate spellings. Think adviser vs. advisor. While "advisor" is not wrong, the opinionated US locale would correct "advisor" to "adviser".

Versioning Some type of versioning is needed so reporting mistakes and errors is easier.

Feedback Mistakes would be sent to some server for agregation and feedback review.

Contractions and Apostrophes This would optionally correct "isnt" to "isn't", etc.

More Repositories

1

libinjection

SQL / SQLI tokenizer parser analyzer
C
994
star
2

ipcat

Categorization of IP Addresses
Go
528
star
3

shlib

portable functions for posix shell environments
Shell
360
star
4

stringencoders

Fast c-string transformations
Objective-C
137
star
5

sslassert

simple scripts to make sure your web server is configured correctly under HTTPS
CSS
57
star
6

snowflake2time

Converts twitter snowflake ids to UTC timestamps (unix epoch seconds or milliseconds) and back in php and python
PHP
50
star
7

gospell

pure golang spelling based on hunspell dictionaries
Go
41
star
8

csstool

CSS filters and formatters in golang
Go
28
star
9

reopen

freopen functionality for golang's io.Writers
Go
24
star
10

xson

A HJSON (http://hjson.org) parser and unmarshaller written in Go
Shell
24
star
11

hphp-tools

Tools to aid in building or using HpHp
Shell
17
star
12

gosupplychain

Tools to help golang projects audit dependencies, check licenses, and create bill-of-materials
Go
16
star
13

gsb4u

Google Safe Browsing batch server, client
PHP
15
star
14

dmnt

In Docker, mount a directory into sibling containers without knowing it's origin (local or host data volume)
Go
10
star
15

googledrive2hugo

Converts Google Docs to Hugo HTML content WIP
Go
10
star
16

plaintext

Tools to extract plaintext from computer code and markup
Go
5
star
17

linkcheck

Checks links in static html websites before being published
Shell
3
star
18

codegen

functions for writing golang programs
Go
3
star
19

markdown_tools

tools for markdown
Go
3
star
20

misspell-source-data

Data generators for the client9/misspell project
Go
3
star
21

bgrep

grep using binary search on sorted files
C
2
star
22

gsh

experiments in golang x shell
Go
2
star
23

htmlfmt

html reformatter and minifier that understands block and in-line elements
Shell
2
star
24

ipv4

Fooling around with IPv4 addresses in golang
Go
2
star
25

httpmime

Ensures basic mime types are installed in package mime
Go
2
star
26

xssjson

HTML encodes the values of inside JSON to make "secure by default" against XSS
Go
2
star
27

buster

Generate subresource integrity and cache busting links
Go
1
star
28

shconfig

configuration via functions
Go
1
star
29

s3push

push files to s3, optimized for static websites
Go
1
star
30

go-disconnectme

Parse Disconnect.Me JSON lists into golang structs
Go
1
star
31

domainmap

Fast lookup of FDQN with wildcard to IP address
1
star
32

find

A better "find" written in golang
1
star
33

golang-dev-docker

Dockfile to generate my build/test environment for golang
Makefile
1
star
34

libinjection-docker

A docker-based build/test environment for libinjection
Makefile
1
star
35

dotfiles

My dot files
Emacs Lisp
1
star