Discover maxhumber/redframes Open Source project

About

redframes (rectangular data frames) is a general purpose data manipulation library that prioritizes syntax, simplicity, and speed (to a solution). Importantly, the library is fully interoperable with pandas, compatible with scikit-learn, and works great with matplotlib.

Install & Import

pip install redframes

import redframes as rf

Quickstart

Copy-and-paste this to get started:

import redframes as rf

df = rf.DataFrame({
    'bear': ['Brown bear', 'Polar bear', 'Asian black bear', 'American black bear', 'Sun bear', 'Sloth bear', 'Spectacled bear', 'Giant panda'],
    'genus': ['Ursus', 'Ursus', 'Ursus', 'Ursus', 'Helarctos', 'Melursus', 'Tremarctos', 'Ailuropoda'],
    'weight (male, lbs)': ['300-860', '880-1320', '220-440', '125-500', '60-150', '175-310', '220-340', '190-275'],
    'weight (female, lbs)': ['205-455', '330-550', '110-275', '90-300', '45-90', '120-210', '140-180', '155-220']
})

# | bear                | genus      | weight (male, lbs)   | weight (female, lbs)   |
# |:--------------------|:-----------|:---------------------|:-----------------------|
# | Brown bear          | Ursus      | 300-860              | 205-455                |
# | Polar bear          | Ursus      | 880-1320             | 330-550                |
# | Asian black bear    | Ursus      | 220-440              | 110-275                |
# | American black bear | Ursus      | 125-500              | 90-300                 |
# | Sun bear            | Helarctos  | 60-150               | 45-90                  |
# | Sloth bear          | Melursus   | 175-310              | 120-210                |
# | Spectacled bear     | Tremarctos | 220-340              | 140-180                |
# | Giant panda         | Ailuropoda | 190-275              | 155-220                |

(
    df
        .rename({"weight (male, lbs)": "male", "weight (female, lbs)": "female"})
        .gather(["male", "female"], into=("sex", "weight"))
        .split("weight", into=["min", "max"], sep="-")
        .gather(["min", "max"], into=("stat", "weight"))
        .mutate({"weight": lambda row: float(row["weight"])})
        .group(["genus", "sex"])
        .rollup({"weight": ("weight", rf.stat.mean)})
        .spread("sex", using="weight")
        .mutate({"dimorphism": lambda row: round(row["male"] / row["female"], 2)})
        .drop(["male", "female"])
        .sort("dimorphism", descending=True)
)

# | genus      |   dimorphism |
# |:-----------|-------------:|
# | Ursus      |         2.01 |
# | Tremarctos |         1.75 |
# | Helarctos  |         1.56 |
# | Melursus   |         1.47 |
# | Ailuropoda |         1.24 |

For comparison, here's the equivalent pandas:

import pandas as pd

# df = pd.DataFrame({...})

df = df.rename(columns={"weight (male, lbs)": "male", "weight (female, lbs)": "female"})
df = pd.melt(df, id_vars=['bear', 'genus'], value_vars=['male', 'female'], var_name='sex', value_name='weight')
df[["min", "max"]] = df["weight"].str.split("-", expand=True)
df = df.drop("weight", axis=1)
df = pd.melt(df, id_vars=['bear', 'genus', 'sex'], value_vars=['min', 'max'], var_name='stat', value_name='weight')
df['weight'] = df["weight"].astype('float')
df = df.groupby(["genus", "sex"])["weight"].mean()
df = df.reset_index()
df = pd.pivot_table(df, index=['genus'], columns=['sex'], values='weight')
df = df.reset_index()
df = df.rename_axis(None, axis=1)
df["dimorphism"] = round(df["male"] / df["female"], 2)
df = df.drop(["female", "male"], axis=1)
df = df.sort_values("dimorphism", ascending=False)
df = df.reset_index(drop=True)

# 🤮

IO

Save, load, and convert rf.DataFrame objects:

# save .csv
rf.save(df, "bears.csv")

# load .csv
df = rf.load("bears.csv")

# convert redframes → pandas
pandas_df = rf.unwrap(df)

# convert pandas → redframes
df = rf.wrap(pandas_df)

Verbs

Verbs are pure and "chain-able" methods that manipulate rf.DataFrame objects. Here is the complete list (see docstrings for examples and more details):

Verb	Description
`accumulate`^‡	Run a cumulative sum over a column
`append`	Append rows from another DataFrame
`combine`	Combine multiple columns into a single column (opposite of `split`)
`cross`	Cross join columns from another DataFrame
`dedupe`	Remove duplicate rows
`denix`	Remove rows with missing values
`drop`	Drop entire columns (opposite of `select`)
`fill`	Fill missing values "down", "up", or with a constant
`filter`	Keep rows matching specific conditions
`gather`^‡	Gather columns into rows (opposite of `spread`)
`group`	Prepare groups for compatible verbs^‡
`join`	Join columns from another DataFrame
`mutate`	Create a new, or overwrite an existing column
`pack`^‡	Collate and concatenate row values for a target column (opposite of `unpack`)
`rank`^‡	Rank order values in a column
`rename`	Rename column keys
`replace`	Replace matching values within columns
`rollup`^‡	Apply summary functions and/or statistics to target columns
`sample`	Randomly sample any number of rows
`select`	Select specific columns (opposite of `drop`)
`shuffle`	Shuffle the order of all rows
`sort`	Sort rows by specific columns
`split`	Split a single column into multiple columns (opposite of `combine`)
`spread`	Spread rows into columns (opposite of `gather`)
`take`^‡	Take any number of rows (from the top/bottom)
`unpack`	"Explode" concatenated row values into multiple rows (opposite of `pack`)

Properties

In addition to all of the verbs there are several properties attached to each DataFrame object:

df["genus"] 
# ['Ursus', 'Ursus', 'Ursus', 'Ursus', 'Helarctos', 'Melursus', 'Tremarctos', 'Ailuropoda']

df.columns 
# ['bear', 'genus', 'weight (male, lbs)', 'weight (female, lbs)']

df.dimensions
# {'rows': 8, 'columns': 4}

df.empty
# False

df.memory
# '2 KB'

df.types
# {'bear': object, 'genus': object, 'weight (male, lbs)': object, 'weight (female, lbs)': object}

matplotlib

rf.DataFrame objects integrate seamlessly with matplotlib:

import redframes as rf
import matplotlib.pyplot as plt

football = rf.DataFrame({
    'position': ['TE', 'K', 'RB', 'WR', 'QB'],
    'avp': [116.98, 131.15, 180, 222.22, 272.91]
})

df = (
    football
        .mutate({"color": lambda row: row["position"] in ["WR", "RB"]})
        .replace({"color": {False: "orange", True: "red"}})
)

plt.barh(df["position"], df["avp"], color=df["color"]);

scikit-learn

rf.DataFrame objects are fully compatible with sklearn functions, estimators, and transformers:

import redframes as rf
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

df = rf.DataFrame({
    "touchdowns": [15, 19, 5, 7, 9, 10, 12, 22, 16, 10],
    "age": [21, 22, 21, 24, 26, 28, 30, 35, 28, 21],
    "mvp": [1, 1, 0, 0, 0, 0, 0, 1, 0, 0]
})

target = "touchdowns"
y = df[target]
X = df.drop(target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

model = LinearRegression()
model.fit(X_train, y_train)
model.score(X_test, y_test)
# 0.5083194901655527

print(X_train.take(1))
# rf.DataFrame({'age': [21], 'mvp': [0]})

X_new = rf.DataFrame({'age': [22], 'mvp': [1]})
model.predict(X_new)
# array([19.])

maxhumber/redframes

maxhumber

Reviews

Repository Details