• Stars
    star
    16
  • Rank 1,311,288 (Top 26 %)
  • Language
    Python
  • Created over 9 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Vocabulary using n-grams

More Repositories

1

shovel

Rake, for Python
Python
664
star
2

simhash-py

Simhash and near-duplicate detection
Python
377
star
3

qless

Queue / Pipeline Management
Ruby
292
star
4

pyreBloom

Fast Redis Bloom Filters in Python
Python
286
star
5

interpol

A toolkit for working with API endpoint definition files, giving you a stub app, a schema validation middleware, and browsable documentation.
HTML
187
star
6

word2gauss

Gaussian word embeddings
Python
186
star
7

reppy

Modern robots.txt Parser for Python
Python
178
star
8

SEOmozAPISamples

Mozscape API sample code
Java
158
star
9

simhash-cpp

Simhashing in C++
C++
121
star
10

url-py

URL Transformation, Sanitization
Python
102
star
11

qless-core

Core Lua Scripts for qless
Python
83
star
12

simhash-db-py

Python API for Various DB-Backed Simhash Clusters
Python
63
star
13

qless-py

Python Bindings for qless
Python
48
star
14

qdr

Query-Document Relevance
Python
43
star
15

dragnet_data

Training/test data for Dragnet
Shell
41
star
16

publicsuffix-elixir

Elixir library providing public suffix logic based on publicsuffix.org data
Elixir
38
star
17

linkscape-gem

Provides an interface to SEOmoz's suite of APIs, including the free and site intelligence APIs.
Ruby
38
star
18

simhash-cluster

A cluster implementation of simhash near-duplicate detection
Python
33
star
19

Social-Authority-SDK

Ruby
33
star
20

s3po

Your Friendly Asynchronous S3 Upload Protocol Droid
Python
30
star
21

GWT-keyword-analysis

Analysis of Google Webmaster Tools search data
Python
25
star
22

g-crawl-py

Gevent Crawling in Python, with Utilities
Python
23
star
23

mozsci

Data science tools from Moz
Python
22
star
24

url-cpp

C++ bindings for url parsing and sanitization
C++
19
star
25

uri_parser

A fast URI parser that wraps Google's chromium URL canonicalization library
C++
13
star
26

downpour

Fetch urls quickly and asynchronously with Twisted, honoring politeness.
Python
13
star
27

rep-cpp

Robot exclusion protocol in C++
C++
12
star
28

mltk

mltk - Moz Language Tool Kit
Python
12
star
29

plines

Easily create job pipelines out of declared job dependencies using Qless.
Ruby
10
star
30

awssh

AWSSH Config
Python
9
star
31

roger-mesos

A complete mesos cluster setup with automatic load balancing
Python
8
star
32

linkscape-py

Python Bindings for Linkscape's API
Python
5
star
33

qless-js

Node.js bindings for qless
JavaScript
5
star
34

roger-bamboo

Roger's internal load balancer and frontend proxy. Based on https://github.com/QubitProducts/bamboo
Go
5
star
35

gzippy

Gzip files in python
Python
4
star
36

asis

Lightweight As-Is Server
Python
4
star
37

awscpp

AWS C++ Bindings
C++
3
star
38

rack-authenticate

Rack middleware that handles basic auth and HMAC auth
Ruby
3
star
39

elasticsearch-utils

Some elasticsearch utilities I've put together / been using in investigating elasticsearch performance
Python
3
star
40

pyjudy

Python bindings to libJudy
Python
3
star
41

resque-unfairly

A Resque plugin for processing queues from random jobs based on queue weightings. Inspired by resque-fairly.
Ruby
3
star
42

roger-monitoring

Monitoring stack for RogerOS
Python
3
star
43

crawl-curio-cabinet

A Curio Cabinet of the Odd Behaviors We've Seen on the Internet
HTML
3
star
44

qless-docker

Create a qless docker image!
Ruby
2
star
45

irobot

robots.txt file inspection
Ruby
2
star
46

bloomfilter-py

Simple and fast Bloom filter
Python
2
star
47

docker-sortdb

Docker setup for SortDB
Shell
1
star
48

qless-java

qless java binding
Java
1
star
49

zendesk-search

Search for tags and such in zendesk
JavaScript
1
star
50

deb-swift

1
star
51

fiji

Cell schemas and schema versioning for HBase
HTML
1
star
52

p5-Webservice-Followerwonk-SocialAuthority

Perl Client for The Followerwonk Social Authority API
Perl
1
star
53

qless-util-py

Utilities for use with qless-py
Python
1
star
54

process_tree_dictionary

Implements a dictionary that is scoped to a process tree for Erlang and Elixir.
Elixir
1
star
55

moz_nav

DEPRECATED. Common navigation and layout across all SEOmoz applications
Ruby
1
star
56

logtools

Stuff for reading crawler log files. Probably not of much interest to those outside of SeoMOZ.
Python
1
star