• Stars
    star
    139
  • Rank 262,954 (Top 6 %)
  • Language
    Go
  • Created almost 10 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

simhash storage and searching

go-simstore: store and search through simhashes

This package is an implementation of section 3 of "Detecting Near-Duplicates for Web Crawling" by Manku, Jain, and Sarma,

http://www2007.org/papers/paper215.pdf

  • simhash is a simple simhashing library.
  • simstore is the storage and searching logic
  • simd is a small daemon that wraps simstore and exposes a http /search endpoint

This code is licensed under the MIT license

Copyright (c) 2014 Damian Gryski [email protected]

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

More Repositories

1

go-perfbook

Thoughts on Go performance optimization
10,631
star
2

awesome-consensus

Awesome list for Paxos and friends
2,026
star
3

awesome-go-style

A collection of Go style guides
970
star
4

go-tsz

Time series compression algorithm from Facebook's Gorilla paper
Go
538
star
5

semgrep-go

Go rules for semgrep and go-ruleguard
Go
455
star
6

dgoogauth

Google Authenticator for Go
Go
419
star
7

go-jump

go-jump: Jump consistent hashing
Go
382
star
8

trifles

A playground for things that aren't interesting enough to have their own repo.
Go
330
star
9

go-tinylfu

TinyLFU cache admission policy
Go
251
star
10

go-farm

go-farm: a pure-Go farmhash implementation
Go
238
star
11

vim-godef

vim plugin providing godef support
Vim Script
219
star
12

interesting-papers

Interesting papers I'd like to implement (or at least have implementations of)
122
star
13

go-bloomindex

Bloom-filter based search index
Go
122
star
14

dkeyczar

Port of Google's Keyczar cryptography library to Go
Go
111
star
15

go-xxh3

xxh3 fast hash function
Go
104
star
16

dmrgo

Go library for writing standalone Map/Reduce jobs or for use with Hadoop's streaming protocol
Go
104
star
17

go-metro

Go translation of MetroHash
Go
101
star
18

go-maglev

Go implementation of maglev hashing
Go
92
star
19

go-topk

Streaming TopK estimates
Go
83
star
20

hokusai

hokusai -- sketching streams in real-time
Go
79
star
21

go-highway

Go implementation of Google's HighwayHash
Python
74
star
22

go-boomphf

Fast and scalable minimal perfect hashing for massive key sets
Go
71
star
23

go-lttb

Implementation of Largest-Triangle-Three-Buckets down-sampling algorithm
Go
70
star
24

go-bitstream

go-bitstream: read and write bits from io.Reader and io.Writer
Go
68
star
25

dgohash

A collection of well-known string hash functions, implemented in Go
Go
66
star
26

go-failure

Phi Accrual Failure Detection
Go
65
star
27

go-mph

minimal perfect hash functions
Go
62
star
28

go-rendezvous

rendezvous hashing
Go
61
star
29

go-ketama

Ketama implementation compatible with Algorithm::ConsistentHash::Ketama
Go
59
star
30

go-identicon

Create simple visual hashes of data, similar to github's identicons.
Go
58
star
31

talks

Go
58
star
32

gttp

gttp: http for gophers
Go
58
star
33

bread

Notes on bread baking
54
star
34

libchash

simple consistent hashing implementation
C
53
star
35

go-change

Online Change Detection Algorithm
Go
53
star
36

gophervids

Proof of concept Gopher Video player
HTML
51
star
37

go-onlinestats

One-pass running statistics
Go
51
star
38

go-gk

gk: streaming quantiles
Go
43
star
39

go-bits

amd64 optimized bit operations
Go
41
star
40

go-minhash

BottomK minwise hashing for streaming set similarity
Go
41
star
41

go-mpchash

Multi-probe consistent hashing
Go
40
star
42

go-pcgr

pcg random number generator
Go
40
star
43

go-sequitur

Sequitur algorithm for recognizing lexical structure in strings
Go
39
star
44

go-groupvarint

SSE-optimized group varint integer encoding
Go
38
star
45

go-discreterand

Return random values sampled from a discrete distribution
Go
38
star
46

go-shardedkv

sharded key-value store compatible with p5-ShardedKV
Go
35
star
47

go-arc

adaptive replacement cache
Go
35
star
48

go-sip13

siphash 1-3
Go
35
star
49

go-trigram

Small trigram indexer
Go
34
star
50

go-wyhash

wyhash fast non-cryptographic string hash
Go
34
star
51

go-kll

KLL sketch: Almost Optimal Streaming Quantiles
Go
33
star
52

go-clockpro

go-clockpro: CLOCK-Pro cache eviction algorithm
Go
33
star
53

go-ddmin

ddmin test case minimization algorithm
Go
31
star
54

go-fastquantiles

approximate streaming quantiles
Go
31
star
55

go-linebreak

Line breaking in linear time
Go
30
star
56

rgip

rgip: restful geoip service
Go
30
star
57

go-skip32

Skip32 integer obfuscation routines
Go
29
star
58

go-s4lru

s4lru cache
Go
28
star
59

go-yubicloud

go-yubicloud: Client for Yubico's OTP Validation Service
Go
27
star
60

go-fuzzstr

Fuzzy text searching like Sublime Text
Go
27
star
61

go-cuckoof

Go implemetation of cuckoo filters
Go
26
star
62

go-multiq

multiq: a relaxed, concurrent priority queue
Go
24
star
63

go-subset

deterministic subsetting
Go
24
star
64

ragel-examples

Go
23
star
65

go-duoweb

Duo Security two-factor authentication for Go web applications
Go
23
star
66

go-yubiauth

Yubikey Authorization Server
Go
23
star
67

haiku-finder

A program to search text files for sentences that match 5-7-5 a syllable count.
Go
22
star
68

go-xoshiro

xoshiro256** random number generator
Go
22
star
69

go-t1ha

Go implementation of the t1ha hash function
Go
21
star
70

go-hollow

Hollow Heaps for Go
Go
20
star
71

dpc

beginnings of a toy pascal compiler
Go
20
star
72

go-holtwinters

Holt-Winters forecasting
Go
20
star
73

go-keyless

Client and server reimplementation of CloudFlare's Keyless
Go
19
star
74

go-timewindow

Counters over sliding windows
Go
19
star
75

modelchecking

model checking samples
Go
17
star
76

dgobloom

A simple Bloom Filter implementation in Go
Go
17
star
77

dhd

hexdumper with tcp proxy support
Go
17
star
78

go-gramgen

Simple generative fuzzer
Go
16
star
79

go-expirecache

Simple expiring cache
Go
16
star
80

peachpy-examples

Python
15
star
81

go-stampede

Optimal cache stampede prevention
Go
15
star
82

go-disco

discohash
Go
15
star
83

go-xoroshiro

Go implementation of xoroshiro128+ RNG
Go
15
star
84

go-fastlz

Go implementation of FastLZ compression
Go
14
star
85

numerical-rs

Numerical integration routines for Rust
Rust
14
star
86

mph-rs

minimal perfect hashing for rust
Rust
13
star
87

go-tinymap

tinymap is a small map implementation
Go
13
star
88

go-lzo

Go wrapper for LZO compression library
Go
13
star
89

go-interp

Interpolation search
Go
12
star
90

go-zlatlong

zlatlong -- Microsoft's lat/long compression algorithm
Go
12
star
91

go-spooky

Spooky Hash
Go
12
star
92

go-marvin32

Assembly-optimized Marvin32 hash function
Go
12
star
93

go-cobs

Consistent Overhead Byte Stuffing encoding for Go
Go
11
star
94

go-postings

Search engine postings list with support for compresison
Go
11
star
95

cobs-rs

consistent overhead byte stuffing
Rust
11
star
96

go-bloomf

Simple bloom filter
Go
11
star
97

go-rebucket

ReBucket: group panic()s by similarity
Go
11
star
98

go-abitvec

atomic bitvector
Go
11
star
99

go-csnappy

go-csnappy wraps the snappy compression library
Go
11
star
100

go-siphasm

siphasm: fast amd64 siphash-2-4
Go
10
star