• Stars
    star
    25
  • Rank 939,209 (Top 19 %)
  • Language
    Clojure
  • License
    Eclipse Public Li...
  • Created about 10 years ago
  • Updated over 9 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Subotai brings routines for extracting information from HTML documents to clojure

More Repositories

1

pegasus

🐎✈️ Pegasus is a scalable, modular, polite web-crawler for Clojure
Clojure
258
star
2

Listener

Detect calls of attention in the surroundings
Python
52
star
3

clj-lmdb

Clojure wrapper for lmdb
Clojure
36
star
4

fort-knox

A disk-backed core.cache implementation based on LMDB
Clojure
23
star
5

sleipnir

A simple, performant web-crawler for clojure
Clojure
17
star
6

clojure-manifold

Manifold learning algorithms in clojure
Clojure
15
star
7

polyglot-toolbox

Polyglot skipgram embeddings, and their many health benefits
Python
11
star
8

vad_python

A solid VAD in Python
Python
9
star
9

VAD-py

Webrtc VAD in Python
C
9
star
10

JPredict

Applying ML Techniques to Predict Drawn Japanese Characters. Currently Hiragana is implemented
C#
8
star
11

robust_pcp

Robust Principal Component Pursuit
Python
7
star
12

clojure_scraping_overview

XPath and enlive
Clojure
7
star
13

tinywm-rkt

TinyWM Implementation in Racket
Racket
6
star
14

tree-edit-distance

An implementation of a tree-edit-distance algorithm for structure-based clustering in clojure
Clojure
5
star
15

kublai

Truncated matrix decompositions for core.matrix
Clojure
4
star
16

sutime-clojure

A wrapper around the Time NER Tagger in Stanford Core NLP Suite.
Clojure
3
star
17

enlive-helper

A more powerful html-resource for use with enlive's functions
Clojure
3
star
18

clj-heritrix

Clojure implementation of the heritrix REST API
Clojure
2
star
19

structural_similarity

Compare html documents for similarity in structure (or template)
Clojure
2
star
20

probabilistic-counting

Cardinality estimation algorithms in clojure
Clojure
2
star
21

crawler

ephemeral content finder
Clojure
2
star
22

clj-named-leveldb

named databases for leveldb using one simple hack they don't want you to know
Clojure
1
star
23

trec

Trec Federated Search Track
Python
1
star
24

satcharitra

Clojure
1
star
25

pegasus-examples

Pegasus Examples
Clojure
1
star
26

racket-whistlepig

Racket bindings to the whistlepig engine
Racket
1
star
27

clj-spectral

Spectral algorithms in clojure targeting core.matrix
Clojure
1
star
28

pgm-indian-buffet-process

Scribe Notes for CMU 10-708 Lecture on Indian Buffer Process
1
star
29

clojure-kindle-highlights

Scrape the kindle highlights webpage and download the highlights for a book from there.
Clojure
1
star
30

geojson3d

3d Render GeoJsons
JavaScript
1
star
31

heritrix-clojure

Heritrix API implementation in clojure (a bit of a kludge at the moment)
Clojure
1
star
32

index-page-crawler

Follow pagination and get pages
Clojure
1
star
33

web-corpus

Clueweb web corpus pipeline
Clojure
1
star
34

india_in_data

India in data source, datasets, materials
1
star
35

consistent-hashing

Consistent hashing implementation in clojure
Java
1
star
36

clj-dimension

Algorithms to study and reduce dimensions of datasets
Clojure
1
star
37

warc-clojure

Clojure wrapper around a Java library to read warc files.
Clojure
1
star