• Stars
    star
    144
  • Rank 255,590 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A small python library that can clump lists of data together.

Clumper

A small python library that can clump lists of nested data together.

Part of a video series on calmcode.io.

Base Example

Clumper allows you to quickly parse through a list of json-like data.

Here's an example of such a dataset.

pokemon = [
    {'name': 'Bulbasaur', 'type': ['Grass', 'Poison'], 'hp': 45, 'attack': 49},
    {'name': 'Charmander', 'type': ['Fire'], 'hp': 39, 'attack': 52},
    ...
]

Given this list of dictionaries we can write the following query;

from clumper import Clumper

clump = Clumper.read_json('https://calmcode.io/datasets/pokemon.json')

(clump
  .keep(lambda d: len(d['type']) == 1)
  .mutate(type=lambda d: d['type'][0],
          ratio=lambda d: d['attack']/d['hp'])
  .select('name', 'type', 'ratio')
  .sort(lambda d: d['ratio'], reverse=True)
  .head(5)
  .collect())
What this code does line-by-line. This code will perform the following steps.
  1. It imports Clumper.
  2. It fetches a list of json-blobs about pokemon from the internet.
  3. It removes all the pokemon that have more than 1 type.
  4. The dictionaries that are left will have their type now as a string instead of a list of strings.
  5. The dictionaries that are left will also have a property called ratio which calculates the ratio between hp and attack.
  6. All the keys besides name, type and ratio are removed.
  7. The collection is sorted by ratio, from high to low.
  8. We grab the top 5 after sorting.
  9. The results are returned as a list of dictionaries.

This is what we get back:

[{'name': 'Diglett', 'type': 'Ground', 'ratio': 5.5},
 {'name': 'DeoxysAttack Forme', 'type': 'Psychic', 'ratio': 3.6},
 {'name': 'Krabby', 'type': 'Water', 'ratio': 3.5},
 {'name': 'DeoxysNormal Forme', 'type': 'Psychic', 'ratio': 3.0},
 {'name': 'BanetteMega Banette', 'type': 'Ghost', 'ratio': 2.578125}]

Documentation

We've got a lovely documentation page that explains how the library works.

Features

  • This library has no dependencies besides a modern version of python.
  • The library offers a pattern of verbs that are very expressive.
  • You can write code from top to bottom, left to right.
  • You can read in many json/yaml/csv files by using a wildcard *.
  • MIT License

Installation

You can install this package via pip.

pip install clumper

It may be safer however to install via;

python -m pip install clumper

For details on why, check out this resource.

There are some optional dependencies that you might want to install as well.

python -m pip install clumper[yaml]

Contributing

Make sure you check out the issue list beforehand in order to prevent double work before you make a pull request. To get started locally, you can clone the repo and quickly get started using the Makefile.

git clone [email protected]:koaning/clumper.git
cd clumper
make install-dev

More Repositories

1

scikit-lego

Extra blocks for scikit-learn pipelines.
Python
1,136
star
2

human-learn

Natural Intelligence is still a pretty good idea.
Jupyter Notebook
764
star
3

drawdata

Draw datasets from within Jupyter.
Python
579
star
4

doubtlab

Doubt your data, find bad labels.
Python
485
star
5

whatlies

Toolkit to help understand "what lies" in word embeddings. Also benchmarking!
Python
468
star
6

bulk

A Simple Bulk Labelling Tool
Python
424
star
7

embetter

just a bunch of useful embeddings
Python
381
star
8

cluestar

Gain clues from clustering!
Jupyter Notebook
289
star
9

calm-notebooks

notebooks that are used at calmcode.io
Jupyter Notebook
176
star
10

simsity

Super Simple Similarities Service
Python
141
star
11

memo

Decorators that logs stats.
Python
101
star
12

mktestdocs

Run pytest against markdown files/docstrings.
Python
99
star
13

spacy-youtube-material

Here are the notebooks used during the spacy youtube series.
Jupyter Notebook
96
star
14

tuilwindcss

Very much like Tailwind, but for TUI frameworks in Textual.
CSS
70
star
15

tokenwiser

Bag of, not words, but tricks!
Python
67
star
16

skedulord

captures logs and makes cron more fun
Python
65
star
17

pytest-duration-insights

A mini dashboard to help find slow tests in pytest.
Python
57
star
18

arxiv-frontpage

My personal frontpage app
HTML
46
star
19

scikit-partial

Pipeline components that support partial_fit.
Python
35
star
20

scikit-fairness

this repo might get accepted
Python
29
star
21

spacy-report

Generate reports for spaCy models.
Python
28
star
22

brent

bayesian graphical modelling and a bit of do-calculus for discrete data.
Jupyter Notebook
27
star
23

icepickle

It's a cooler way to store simple linear models.
Python
26
star
24

koaning

21
star
25

justcharts

Just charts. Really.
HTML
21
star
26

scikit-prune

Prune your sklearn models
Python
19
star
27

thismonth.rocks

motivational website to do something special this month
CSS
18
star
28

sentimany

Just another sentiment wrapper.
Python
17
star
29

kadro

A friendly pandas wrapper with a more composable grammar support.
Jupyter Notebook
14
star
30

prodigy-tui

A textual TUI for Prodigy
CSS
13
star
31

calmcode-feedback

A repo to collect issues with calmcode.io
12
star
32

open_notebooks

Some notebooks that I've shared.
Jupyter Notebook
12
star
33

sentence-models

A different, but useful, textcat approach.
Python
11
star
34

paftdunk

Recommendin' all night to get lucky.
Jupyter Notebook
6
star
35

proglang-project

Python
6
star
36

scikit-teach

Active Learning Benchmarks
Jupyter Notebook
6
star
37

texttoolz

tools and tricks that are good to have around
5
star
38

makefile-demo

just a demo of a makefile in action
Makefile
5
star
39

gitlit

Streamlit App on Github Actions
Python
5
star
40

kolektor

Let's give this git-scraping a try.
Python
5
star
41

optimal-on-paper

broken in reality
Jupyter Notebook
5
star
42

liBERTy

A benchmark to compare BERT against sklearn.
Python
5
star
43

classycookie

cookiecutter to run standard text classifiers
Python
5
star
44

lazylines

Pipelines for JSONL files
Python
4
star
45

salary-bias

just another dangerous situation
Jupyter Notebook
4
star
46

dql101

A 101 repo with some code for openai Deep Q Learning
Jupyter Notebook
4
star
47

boondoc

lightweight Python API docs for markdown
Python
4
star
48

subspacy

BPEmb embeddings for spaCy
Python
4
star
49

akin

Some text similarity utilities
Python
4
star
50

calm-stats

Some GitScrapers
Python
3
star
51

calmcode-datasets

Just a Collection of Datasets
3
star
52

koaning-old.github.io

my personal blog
HTML
3
star
53

sushigo

An OpenAi-like environment for the sushi go card game.
Python
3
star
54

featherbed

Very lightweight text vectors via tf/idf + SVD
Python
3
star
55

onnx-demo

onnx seems interesting
Jupyter Notebook
3
star
56

benchmarks

Collection of benchmarks
Jupyter Notebook
3
star
57

baseliner

baseliner offers simple models that can act as a baseline to compare against
R
3
star
58

spacy-intent-example

intent prediction example on spaCy v3
Python
3
star
59

scikit-bloom

Bloom tricks for text pipelines in scikit-learn.
Python
3
star
60

github-slideshow

A robot powered training repository 🤖
HTML
2
star
61

wordlists

Just a bunch of potentially useful wordlists.
2
star
62

gli

my gleeful scripts for the cli
Python
2
star
63

labeltable

Things for bulk labelling.
Python
2
star
64

fusebox

Finetune-able Universal Sentence Encoder
Jupyter Notebook
2
star
65

subsette

A dash-boarding environment for datasette.
HTML
2
star
66

manyterms

Many terms for whatever purposes (weak labelling)
2
star
67

sentency

Lightweight SpaCy pipeline to detect sentences.
2
star
68

pydata-slovenia-talk

Bag of NLP Tricks!
Jupyter Notebook
2
star
69

helloworld

a helloworld package that should just work
R
2
star
70

uvnb

Have UV deal with all your Jupyter deps.
Jupyter Notebook
2
star
71

blackjack

a simple pytest demo
Python
2
star
72

demopkg

a demo pkg in R with github actions
R
2
star
73

lamarl

sushigo simulations on an aws backend
Python
2
star
74

wow-avatar-datasets

A place to host some parquet files.
2
star
75

python_data_intro

A beginner notebook for people who want to get started with python and data. Joy ensues!
Jupyter Notebook
2
star
76

buggingface

Let's see what we can learn from poking huggingface models.
1
star
77

digital-potato

HTML
1
star
78

gha-demo

Demo application for GitHub Actions tutorial.
Python
1
star
79

fastfood-bot

a rasa demo that can find you a fast food location
1
star
80

ecosystem-watcher

Just keeping an eye on the ecosystem.
Python
1
star
81

git-scrape-unravel

CLI to unravel git-scraped code.
1
star
82

scikit-prodigy

Helpers to leverage scikit-learn pipelines in Prodigy.
Python
1
star
83

skooba

less weak supervision
1
star
84

rasa-nlu-deploy

A demo that can run Rasa NLU in a container.
Python
1
star
85

datasette-parcoords

Parallel coordinates chart for datasette
JavaScript
1
star
86

nlu-cluster-demo

Upload your model file and talk to it!
Jupyter Notebook
1
star
87

tjek

tjek changes with the main branch
Python
1
star
88

katacoda-scenarios

Katacoda Scenarios
1
star
89

bulk-datasets

Helpers for the download command.
1
star
90

there-are-no-bad-labels

Repo for the PyData 2023 Workshop
Jupyter Notebook
1
star
91

tokenvolt

Populate an embedding cache quickly and get on with your day.
Python
1
star
92

rusty

Learning how to Rst
1
star
93

uvtrick

I really outdid myself with this hack.
Python
1
star
94

ollama-railway

Just to see if this might work out well.
Python
1
star