• Stars
    star
    3
  • Rank 3,963,521 (Top 79 %)
  • Language
  • Created over 3 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Just a Collection of Datasets

More Repositories

1

scikit-lego

Extra blocks for scikit-learn pipelines.
Python
1,136
star
2

human-learn

Natural Intelligence is still a pretty good idea.
Jupyter Notebook
764
star
3

drawdata

Draw datasets from within Jupyter.
Python
579
star
4

doubtlab

Doubt your data, find bad labels.
Python
485
star
5

whatlies

Toolkit to help understand "what lies" in word embeddings. Also benchmarking!
Python
468
star
6

bulk

A Simple Bulk Labelling Tool
Python
424
star
7

embetter

just a bunch of useful embeddings
Python
381
star
8

cluestar

Gain clues from clustering!
Jupyter Notebook
289
star
9

calm-notebooks

notebooks that are used at calmcode.io
Jupyter Notebook
176
star
10

clumper

A small python library that can clump lists of data together.
Python
144
star
11

simsity

Super Simple Similarities Service
Python
141
star
12

memo

Decorators that logs stats.
Python
101
star
13

mktestdocs

Run pytest against markdown files/docstrings.
Python
99
star
14

spacy-youtube-material

Here are the notebooks used during the spacy youtube series.
Jupyter Notebook
96
star
15

tuilwindcss

Very much like Tailwind, but for TUI frameworks in Textual.
CSS
70
star
16

tokenwiser

Bag of, not words, but tricks!
Python
67
star
17

skedulord

captures logs and makes cron more fun
Python
65
star
18

pytest-duration-insights

A mini dashboard to help find slow tests in pytest.
Python
57
star
19

arxiv-frontpage

My personal frontpage app
HTML
46
star
20

scikit-partial

Pipeline components that support partial_fit.
Python
35
star
21

scikit-fairness

this repo might get accepted
Python
29
star
22

spacy-report

Generate reports for spaCy models.
Python
28
star
23

brent

bayesian graphical modelling and a bit of do-calculus for discrete data.
Jupyter Notebook
27
star
24

icepickle

It's a cooler way to store simple linear models.
Python
26
star
25

koaning

21
star
26

justcharts

Just charts. Really.
HTML
21
star
27

scikit-prune

Prune your sklearn models
Python
19
star
28

thismonth.rocks

motivational website to do something special this month
CSS
18
star
29

sentimany

Just another sentiment wrapper.
Python
17
star
30

kadro

A friendly pandas wrapper with a more composable grammar support.
Jupyter Notebook
14
star
31

prodigy-tui

A textual TUI for Prodigy
CSS
13
star
32

calmcode-feedback

A repo to collect issues with calmcode.io
12
star
33

open_notebooks

Some notebooks that I've shared.
Jupyter Notebook
12
star
34

sentence-models

A different, but useful, textcat approach.
Python
11
star
35

paftdunk

Recommendin' all night to get lucky.
Jupyter Notebook
6
star
36

proglang-project

Python
6
star
37

scikit-teach

Active Learning Benchmarks
Jupyter Notebook
6
star
38

texttoolz

tools and tricks that are good to have around
5
star
39

makefile-demo

just a demo of a makefile in action
Makefile
5
star
40

gitlit

Streamlit App on Github Actions
Python
5
star
41

kolektor

Let's give this git-scraping a try.
Python
5
star
42

optimal-on-paper

broken in reality
Jupyter Notebook
5
star
43

liBERTy

A benchmark to compare BERT against sklearn.
Python
5
star
44

classycookie

cookiecutter to run standard text classifiers
Python
5
star
45

lazylines

Pipelines for JSONL files
Python
4
star
46

salary-bias

just another dangerous situation
Jupyter Notebook
4
star
47

dql101

A 101 repo with some code for openai Deep Q Learning
Jupyter Notebook
4
star
48

boondoc

lightweight Python API docs for markdown
Python
4
star
49

subspacy

BPEmb embeddings for spaCy
Python
4
star
50

akin

Some text similarity utilities
Python
4
star
51

calm-stats

Some GitScrapers
Python
3
star
52

koaning-old.github.io

my personal blog
HTML
3
star
53

sushigo

An OpenAi-like environment for the sushi go card game.
Python
3
star
54

featherbed

Very lightweight text vectors via tf/idf + SVD
Python
3
star
55

onnx-demo

onnx seems interesting
Jupyter Notebook
3
star
56

benchmarks

Collection of benchmarks
Jupyter Notebook
3
star
57

baseliner

baseliner offers simple models that can act as a baseline to compare against
R
3
star
58

spacy-intent-example

intent prediction example on spaCy v3
Python
3
star
59

scikit-bloom

Bloom tricks for text pipelines in scikit-learn.
Python
3
star
60

github-slideshow

A robot powered training repository 🤖
HTML
2
star
61

wordlists

Just a bunch of potentially useful wordlists.
2
star
62

gli

my gleeful scripts for the cli
Python
2
star
63

labeltable

Things for bulk labelling.
Python
2
star
64

fusebox

Finetune-able Universal Sentence Encoder
Jupyter Notebook
2
star
65

subsette

A dash-boarding environment for datasette.
HTML
2
star
66

manyterms

Many terms for whatever purposes (weak labelling)
2
star
67

sentency

Lightweight SpaCy pipeline to detect sentences.
2
star
68

pydata-slovenia-talk

Bag of NLP Tricks!
Jupyter Notebook
2
star
69

helloworld

a helloworld package that should just work
R
2
star
70

uvnb

Have UV deal with all your Jupyter deps.
Jupyter Notebook
2
star
71

blackjack

a simple pytest demo
Python
2
star
72

demopkg

a demo pkg in R with github actions
R
2
star
73

lamarl

sushigo simulations on an aws backend
Python
2
star
74

wow-avatar-datasets

A place to host some parquet files.
2
star
75

python_data_intro

A beginner notebook for people who want to get started with python and data. Joy ensues!
Jupyter Notebook
2
star
76

buggingface

Let's see what we can learn from poking huggingface models.
1
star
77

digital-potato

HTML
1
star
78

gha-demo

Demo application for GitHub Actions tutorial.
Python
1
star
79

fastfood-bot

a rasa demo that can find you a fast food location
1
star
80

ecosystem-watcher

Just keeping an eye on the ecosystem.
Python
1
star
81

git-scrape-unravel

CLI to unravel git-scraped code.
1
star
82

scikit-prodigy

Helpers to leverage scikit-learn pipelines in Prodigy.
Python
1
star
83

skooba

less weak supervision
1
star
84

rasa-nlu-deploy

A demo that can run Rasa NLU in a container.
Python
1
star
85

datasette-parcoords

Parallel coordinates chart for datasette
JavaScript
1
star
86

nlu-cluster-demo

Upload your model file and talk to it!
Jupyter Notebook
1
star
87

tjek

tjek changes with the main branch
Python
1
star
88

katacoda-scenarios

Katacoda Scenarios
1
star
89

bulk-datasets

Helpers for the download command.
1
star
90

there-are-no-bad-labels

Repo for the PyData 2023 Workshop
Jupyter Notebook
1
star
91

tokenvolt

Populate an embedding cache quickly and get on with your day.
Python
1
star
92

rusty

Learning how to Rst
1
star
93

uvtrick

I really outdid myself with this hack.
Python
1
star
94

ollama-railway

Just to see if this might work out well.
Python
1
star