• Stars
    star
    102
  • Rank 324,474 (Top 7 %)
  • Language
    Jupyter Notebook
  • License
    BSD 3-Clause "New...
  • Created about 1 year ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Buckaroo - the data wrangling assistant for pandas. Quickly explore dataframes, and run pandas commands via a GUI. Works inside the jupyter notebook.

Buckaroo - The Data Table for Jupyter

Buckaroo is a modern data table for Jupyter that expedites the most common exploratory data analysis tasks. The most basic data analysis task - looking at the raw data, is cumbersome with the existing pandas tooling. Buckaroo starts with a modern performant data table that displays up to 10k rows, is sortable, has value formatting, and scrolls. On top of the core table experience extra features like summary stats, histograms, smart sampling, auto-cleaning, and a low code UI are added. All of the functionality has sensible defaults that can be overridden to customize the experience for your workflow.

Polars-Buckaroo

Try it today

Quick start

run pip install buckaroo in a notebook execute the following to see Buckaroo

import pandas as pd
import buckaroo
pd.DataFrame({'a':[1, 2, 10, 30, 50, 60, 50], 'b': ['foo', 'foo', 'bar', pd.NA, pd.NA, pd.NA, pd. NA]})

When you run import buckaroo in a Jupyter notebook, Buckaroo becomes the default display method for Pandas and Polars DataFrames

Compatibility

Buckaroo works in the following notebook environments

  • jupyter lab (version >=3.6.0)
  • jupyter notebook (version >=7.0)
  • VS Code notebooks (with extra install)
  • Google colab (with special initiation code)

Buckaroo works with the following DataFrame libraries

  • pandas (version >=1.3.5)
  • polars optional
  • geopandas optional

Learn More

Buckaroo has extensive docs and tests, the best way to learn about the system is from feature example videos on youtube

Videos

Example Notebooks

The following notebooks must executed in an environemnt with Buckaroo installed.

Features

High performance table

The core data grid of buckaroo is based on AG-Grid. This loads 1000s of cells in less than a second, with highly customizable display, formatting and scrolling. You no longer have to use df.head() to poke at portions of your data.

Fixed width formatting by default

By default numeric columns are formatted to use a fixed width font and commas are added. This allows quick visual confirmation of magnitudes in a column.

Histograms

Histograms for every column give you a very quick overview of the distribution of values, including uniques and N/A.

Summary stats

The summary stats view can be toggled by clicking on the 0 below the Ξ£ icon. Summary stats are similar to df.describe and extensible.

Inteligent sampling

Buckaroo will display entire DataFrames up to 10k rows. Displaying more than that would run into performance problems that would make display too slow. When a DataFrame has more than 10k rows, Buckaroo samples a random set of 10k rows, and also adds in the rwos with the 5 most extreme values for each column.

Sorting

All of the data visible in the table (rows shown), is sortable by clicking on a column name, further clicks change sort direction then disable sort for that column. Because extreme values are included with sample rows, you can see outlier values too.

Extensibility at the core

Buckaroo summary stats are built on the Pluggable Analysis Framework that allows individual summary stats to be overridden, and new summary stats to be built in terms of existing summary stats. Care is taken to prevent errors in summary stats from preventing display of a dataframe.

Lowcode UI (beta)

Buckaroo has a simple low code UI with python code gen. This view can be toggled by clicking on the 0 below the Ξ» icon.

Auto cleaning (beta)

Buckaroo can automatically clean dataframes to remove common data errors (a single string in a column of ints, recognizing date times...). This feature is in beta. You can access it by invoking buckaroo as BuckarooWidget(df, auto_clean=True)

Development installation

For a development installation:

git clone https://github.com/paddymul/buckaroo.git
cd buckaroo
#we need to build against 3.6.5, jupyterlab 4.0 has different JS typing that conflicts
# the installable still works in JL4
pip install build twine pytest sphinx polars mypy jupyterlab==3.6.5 pandas-stubs geopolars pyarrow
pip install -ve .

Enabling development install for Jupyter notebook:

Enabling development install for JupyterLab:

jupyter labextension develop . --overwrite

Note for developers: the --symlink argument on Linux or OS X allows one to modify the JavaScript code in-place. This feature is not available with Windows. `

Developing the JS side

There are a series of examples of the components in examples/ex.

Instructions

npm install
npm run dev

Contributions

We ❀️ contributions.

Have you had a good experience with this project? Why not share some love and contribute code, or just let us know about any issues you had with it?

We welcome issue reports here; be sure to choose the proper issue template for your issue, so that we can be sure you're providing the necessary information.

More Repositories

1

css-lite

a css grammar for lisp
Common Lisp
77
star
2

rxvt-js

A rewrite of the rxvt terminal emulator in javascript
C
38
star
3

wikipedia_solr

A sample installation of solr setup for indexing wikipedia
Java
18
star
4

TerminalcastRecord

Tools for recording terminalcasts
C
17
star
5

bokeh_tutorial

Python
6
star
6

osx_keyboard_play

C
5
star
7

citibike_data

A repo for storing python analysis code for citibike data.
Python
3
star
8

cl-gesture

Mouse gestures implemented in common lisp
Common Lisp
3
star
9

sqlalchemy_garden

Some examples of common sqlalchemy patterns
Python
3
star
10

dcf

Data Cleaning Framework, interactively build up pandas transforms
Python
2
star
11

dcf-server

Data Cleaning Framework server
Python
2
star
12

units_py

Working on type checking for measurement based calculation
Python
1
star
13

impatient_test

a library for parallel django unit testing
Python
1
star
14

emacs-pdb

emacs-pdb integration
1
star
15

eltesto

emacs testing library
Emacs Lisp
1
star
16

react-chart-comparison

A demo app that compares popular charting libraries.
JavaScript
1
star
17

paddy-emacs-config

my emacs configuration
Emacs Lisp
1
star
18

clj-logo

a hacky implementation of logo like semantics in clojure, ontop of clj-processing
Clojure
1
star
19

sail-clojure

a sailing simulator written in clojure
Clojure
1
star
20

Americas_cup_notebooks

Jupyter Notebook
1
star
21

ts-ipywidget-starter

The simplest widget-ts repository that I could create that builds in typescript and is succesfully pip installable
TypeScript
1
star
22

buckaroo-data

Data for buckaroo examples
1
star
23

citibike_stats

citibike_stats
Python
1
star