• Stars
    star
    186
  • Rank 207,278 (Top 5 %)
  • Language
    Go
  • License
    MIT License
  • Created about 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Useful Go String methods

Go-string

Go Report Card Str Count Badge

Useful string utility functions for Go projects. Either because they are faster than the common Go version or do not exist in the standard library.

You can find all details here https://pkg.go.dev/github.com/boyter/go-string

Probably the most useful methods are IndexAll and IndexAllIgnoreCase which for string literal searches should be drop in replacements for regexp.FindAllIndex while totally avoiding the regular expression engine and as such being much faster.

Some quick benchmarks using a simple program which opens a 550MB file and searches over it in memory. Each search is done three times, the first using regexp.FindAllIndex and the second using IndexAllIgnoreCase.

For this specific example the wall clock time to run is at least 10x less, but with the same matching results.

$ ./csperf Å¿ecret 550MB
File length 576683100

FindAllIndex (regex ignore case)
Scan took 25.403231773s 16680
Scan took 25.39742299s 16680
Scan took 25.227218738s 16680

IndexAllIgnoreCase (custom)
Scan took 2.04013314s 16680
Scan took 2.019360935s 16680
Scan took 1.996732171s 16680

The above example in code for you to copy

// Simple test comparison between various search methods
func main() {
	arg1 := os.Args[1]
	arg2 := os.Args[2]

	b, err := os.ReadFile(arg2)
	if err != nil {
		fmt.Print(err)
		return
	}

	fmt.Println("File length", len(b))

	haystack := string(b)

	var start time.Time
	var elapsed time.Duration

	fmt.Println("\nFindAllIndex (regex)")
	r := regexp.MustCompile(regexp.QuoteMeta(arg1))
	for i := 0; i < 3; i++ {
		start = time.Now()
		all := r.FindAllIndex(b, -1)
		elapsed = time.Since(start)
		fmt.Println("Scan took", elapsed, len(all))
	}

	fmt.Println("\nIndexAll (custom)")
	for i := 0; i < 3; i++ {
		start = time.Now()
		all := str.IndexAll(haystack, arg1, -1)
		elapsed = time.Since(start)
		fmt.Println("Scan took", elapsed, len(all))
	}

	r = regexp.MustCompile(`(?i)` + regexp.QuoteMeta(arg1))
	fmt.Println("\nFindAllIndex (regex ignore case)")
	for i := 0; i < 3; i++ {
		start = time.Now()
		all := r.FindAllIndex(b, -1)
		elapsed = time.Since(start)
		fmt.Println("Scan took", elapsed, len(all))
	}

	fmt.Println("\nIndexAllIgnoreCase (custom)")
	for i := 0; i < 3; i++ {
		start = time.Now()
		all := str.IndexAllIgnoreCase(haystack, arg1, -1)
		elapsed = time.Since(start)
		fmt.Println("Scan took", elapsed, len(all))
	}
}

Note that it performs best with real documents and wost when searching over random data. Depending on what you are searching you may have a similar speed up or a marginal one.

FindAllIndex has a similar speed up,

// BenchmarkFindAllIndex-8                         2458844	       480.0 ns/op
// BenchmarkIndexAll-8                            14819680	        79.6 ns/op

See the benchmarks for full proof where they test various edge cases.

The other most useful method is HighlightString. HighlightString takes in some content and locations and then inserts in/out strings which can be used for highlighting around matching terms. For example you could pass in "test" and have it return "<strong>te</strong>st". The argument locations accepts output from regexp.FindAllIndex or the included IndexAllIgnoreCase or IndexAll.

All code is dual-licenced as either MIT or Unlicence. Your choice when you use it.

Note that as an Australian I cannot put this into the public domain, hence the choice most liberal licences I can find.

More Repositories

1

scc

Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go
Go
6,574
star
2

cs

command line codespelunker or code search
Go
521
star
3

searchcode-server

The offical home of searchcode-server where you can run searchcode locally. Note that master is generally unstable in the sense that it is not a release. Check releases for release versions https://github.com/boyter/searchcode-server/releases
Java
364
star
4

lc

licensechecker (lc) a command line application which scans directories and identifies what software license things are under producing reports as either SPDX, CSV, JSON, XLSX or CLI Tabular output. Dual-licensed under MIT or the UNLICENSE.
Go
124
star
5

go-http-template

CSS
84
star
6

Phindex

A modular search indexer similar to Lucene written in pure PHP
PHP
74
star
7

hashit

A cross platform tool to compute hashes of files quickly. Similar to hashdeep.
Go
59
star
8

gocodewalker

Library to help with walking of code directories in go
Go
57
star
9

activitypub

Sequence diagrams of how ActivityPub works
51
star
10

dcd

Duplicate Code Detector
Go
50
star
11

aws-s3-bucket-purger

A program that will purge any AWS S3 Bucket of objects and versions quickly
Go
26
star
12

indexer

Go
23
star
13

BATF

Web Based Big Arse Text File
PHP
21
star
14

searchcode

Official support channel for searchcode.com support issues and the like.
18
star
15

SingleBugs

A simple single person bug tracker
HTML
16
star
16

freemoz

A spiritual sucessor to dmoz.org
FreeMarker
16
star
17

really-cheap-chatbot

Really cheap chatbot
Python
14
star
18

python-license-checker

A license checker for source code written in python
Python
12
star
19

scc-data

Go
12
star
20

decodingcaptchas

Decoding CAPTCHA's in Python for Fun and Profit
JavaScript
9
star
21

php-excerpt

Generate search excerpts from text given search terms in PHP.
PHP
8
star
22

golangvectorspace

An implementation of the Vector Space model in GoLang
Go
8
star
23

boganipsum

Get it up ya!
HTML
7
star
24

java-spelling-corrector

A MIT Licensed Java Spelling Corrector
Java
7
star
25

searchcode-server-highlighter

Go
6
star
26

boyter.org

boyter.org
JavaScript
5
star
27

working-with-rust

Rust
4
star
28

cmuf

Completely Messed Up Filesystem
3
star
29

go-spelling-corrector

Go Spelling Corrector
Go
3
star
30

rss-feeds

Tagged lists of RSS feeds
Python
3
star
31

Mutator

Mutation tester which applies directly to source code.
Python
2
star
32

rcc

rcc
Rust
2
star
33

titfortat

Go
2
star
34

phpentitygenerator

Automatically exported from code.google.com/p/phpentitygenerator
PHP
2
star
35

gm-platformer

Learning Game Maker
Game Maker Language
2
star
36

sloc-cloc-code-presso

sloc cloc and code presso
JavaScript
2
star
37

hephaisteion

Can I have some money now?
2
star
38

KnowledgeTree-Exporter

Exporting Documents from KnowledgeTree 3.7.0.2
Python
2
star
39

wizard-duel

Lua
2
star
40

scc-lambda

Lambda for scc
Python
1
star
41

goignore

Go
1
star
42

spells

Trying to generate spell names based on the Harry Potter books
Python
1
star
43

CanvaQueueTest

Canva Queue Test
Java
1
star
44

codespelunker

Shell
1
star
45

hashit-rust

Hash all the things!
Rust
1
star
46

empire-building

Just playing around with generating names and families based on the world of Tsuranai by Feist and Wurts
Python
1
star
47

zig

Playing around with ziglang
1
star
48

tendersearch

Go
1
star
49

wsl-settings

Shell
1
star
50

portfold_old

Portfold.com
Go
1
star