• Stars
    star
    271
  • Rank 151,717 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 4 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Grammar of Data Manipulation in python

datar

A Grammar of Data Manipulation in python

Pypi Github Building Docs and API Codacy Codacy coverage Downloads

Documentation | Reference Maps | Notebook Examples | API

datar is a re-imagining of APIs for data manipulation in python with multiple backends supported. Those APIs are aligned with tidyrverse packages in R as much as possible.

Installation

pip install -U datar

# install with a backend
pip install -U datar[pandas]

# More backends support coming soon

Backends

Repo Badges
datar-numpy 3 18
datar-pandas 4 19
datar-arrow 23 24

Example usage

# with pandas backend
from datar import f
from datar.dplyr import mutate, filter, if_else
from datar.tibble import tibble
# or
# from datar.all import f, mutate, filter, if_else, tibble

df = tibble(
    x=range(4),  # or c[:4]  (from datar.base import c)
    y=['zero', 'one', 'two', 'three']
)
df >> mutate(z=f.x)
"""# output
        x        y       z
  <int64> <object> <int64>
0       0     zero       0
1       1      one       1
2       2      two       2
3       3    three       3
"""

df >> mutate(z=if_else(f.x>1, 1, 0))
"""# output:
        x        y       z
  <int64> <object> <int64>
0       0     zero       0
1       1      one       0
2       2      two       1
3       3    three       1
"""

df >> filter(f.x>1)
"""# output:
        x        y
  <int64> <object>
0       2      two
1       3    three
"""

df >> mutate(z=if_else(f.x>1, 1, 0)) >> filter(f.z==1)
"""# output:
        x        y       z
  <int64> <object> <int64>
0       2      two       1
1       3    three       1
"""
# works with plotnine
# example grabbed from https://github.com/has2k1/plydata
import numpy
from datar import f
from datar.base import sin, pi
from datar.tibble import tibble
from datar.dplyr import mutate, if_else
from plotnine import ggplot, aes, geom_line, theme_classic

df = tibble(x=numpy.linspace(0, 2 * pi, 500))
(
    df
    >> mutate(y=sin(f.x), sign=if_else(f.y >= 0, "positive", "negative"))
    >> ggplot(aes(x="x", y="y"))
    + theme_classic()
    + geom_line(aes(color="sign"), size=1.2)
)

example

# very easy to integrate with other libraries
# for example: klib
import klib
from pipda import register_verb
from datar import f
from datar.data import iris
from datar.dplyr import pull

dist_plot = register_verb(func=klib.dist_plot)
iris >> pull(f.Sepal_Length) >> dist_plot()

example

Testimonials

@coforfe:

Thanks for your excellent package to port R (dplyr) flow of processing to Python. I have been using other alternatives, and yours is the one that offers the most extensive and equivalent to what is possible now with dplyr.

More Repositories

1

python-varname

Dark magics about variable names in python
Python
309
star
2

vscode-gptcommit

Automated git commit messages using GPT models via gptcommit for VS Code.
TypeScript
192
star
3

cnn-convoluter

An interactive player for CNN convolution
JavaScript
112
star
4

pipen

A pipeline framework for python
Python
102
star
5

vcfstats

Powerful statistics for VCF files
Python
60
star
6

liquidpy

A port of liquid template engine for python
Python
56
star
7

toml-bench

Which toml package to use in python?
Python
50
star
8

pipda

A framework for data piping in python
Python
35
star
9

python-import-system

Python import system diagram
Jupyter Notebook
17
star
10

pymedoo

A lightweight database framework for python
Python
15
star
11

plotnine-prism

Prism themes for plotnine, inspired by ggprism.
Python
14
star
12

immunopipe

Integrative analysis for scTCR- and scRNA-seq data
Python
13
star
13

Bioinformatics-cheatsheet

A cheat sheet for Bioinformatians.
11
star
14

biopipen

A set of processes/pipelines for bioinformatics
Python
11
star
15

pyparam

Powerful parameter processing for python
Python
8
star
16

diot

Python dictionary with dot notation
Python
8
star
17

enrichr

A python wrapper for Enrichr APIs
Python
8
star
18

xqute

A job management system for python
Python
8
star
19

simplug

A simple plugin system for python with async hooks supported
Python
7
star
20

python-simpleconf

Simple configuration management with python
Python
6
star
21

attr_property

Property support for attrs
Python
5
star
22

cmdQueue

A job queue available on Windows and *nix
Python
5
star
23

pdtypes

Show data types for pandas data frames in terminal and notebooks
Python
5
star
24

dotdict-bench

Benchmarking for dot-accessible dict packages in python
Python
5
star
25

datar-pandas

The pandas backend for datar
Python
5
star
26

plkit

A wrapper of pytorch-lightning that makes you write even less code.
Python
5
star
27

datar-numpy

The numpy backend for datar.
Python
5
star
28

completions

Shell completions for your program made easy.
Python
5
star
29

argx

Supercharged argparse for Python
Python
4
star
30

jquery.msgbox

A jquery popup plugin
CSS
4
star
31

neutrapy

A CLI tool to build desktop applications with Neutralinojs and Python as backend
Python
4
star
32

pardoc

Yet another docstring parser for python
Python
4
star
33

cmdy

"Shell language" to run command in python
Python
4
star
34

pipen-args

Command line argument parser for pipen
Python
3
star
35

regexr

Regular expressions for humans
Python
3
star
36

pipen-diagram

Draw pipeline diagrams for pipen.
Python
3
star
37

datar-arrow

Python
3
star
38

pipen-report

Report generation system for pipen
HTML
3
star
39

pipen-verbose

Add verbosal information in logs for pipen.
Python
3
star
40

pipen-dry

Dry runner for pipen
Python
2
star
41

datar-polars

Python
2
star
42

pipen-filters

Add a set of useful filters for pipen templates.
Python
2
star
43

immunopipe-example

An example of immunopipe
CSS
2
star
44

pipen-cli-require

Checking the requirements for processes of a pipeline
Python
2
star
45

pipen-cli-init

A pipen CLI plugin to create a pipen project (pipeline)
Python
2
star
46

pipen-annotate

Use docstring to annotate pipen processes
Python
2
star
47

benchwork

A framework for benchmarking in python
Python
2
star
48

immunopipe-AdrienneML-2020

Reanalysis of the scRNA-seq and scTCR-seq data from Luoma, Adrienne M., et al. 2020 using immunopipe.
HTML
2
star
49

pyppl_echo

Echo script output to PyPPL logs
Python
1
star
50

pyppl_rich

Richer information in logs for PyPPL
Python
1
star
51

pyppl_export

Python
1
star
52

pwwang

1
star
53

wigtools

A set of tools for wiggle file
Python
1
star
54

datar-blog

A blog about datar
Python
1
star
55

pyppl_runcmd

Allowing to run local command before and after each process for PyPPL
Python
1
star
56

pipen-log2file

Save running logs to file for pipen
Python
1
star
57

pygff

A more general GFF/GTF parser.
Python
1
star
58

remotedata

Accessing and caching remote data.
Python
1
star
59

pyppl_strict

Python
1
star
60

pyppl_jobtime

Job running time statistics for PyPPL
Python
1
star
61

pyppl_lock

Preventing running processes from running again for PyPPL
Python
1
star
62

pyppl_runners

Some basic runners for PyPPL
Python
1
star
63

pipen-lock

Process lock for pipen to prevent multiple runs at the same time
Python
1
star
64

pyppl_flowchart

Generating flowchart for PyPPL
Python
1
star
65

pyppl_notify

Email notification for PyPPL
Python
1
star
66

mkdocs-material-2020

Try reproducing issue-2020 in mkdocs-material
1
star
67

pipen-runinfo

Generate running information for jobs in pipen pipelines.
Python
1
star
68

conda-jvarkit

Recipts for building jvarkit tools on conda
Python
1
star
69

pyppl_annotate

Adding long description/annotations for PyPPL processes.
Python
1
star
70

pipen-board

Visualizing configuration and running of pipen pipelines on the web
Svelte
1
star
71

pxGPT

pxGPT: Your personal, powerful and private GPT
Python
1
star
72

pipen-cli-run

A pipen cli plugin to run a process or a pipeline
Python
1
star
73

pyppl_context

Upstream and downstream process reference for PyPPL
Python
1
star
74

pyppl_require

Process requirement manager for PyPPL
Python
1
star
75

ceQTL

The co-expression Quantitative Trait Loci pipeline
Python
1
star
76

pyppl_report

A report generating system for PyPPL
JavaScript
1
star
77

Tech-Bytes

Daily digest with curated news, tech posts, articles, and edge technologies from a wide range of reputable sources.
1
star
78

pipen-gcs

Python
1
star
79

gglogger

gglogger is an R package that logs the calls used to create ggplot2 objects.
R
1
star
80

as_yt-dlp

A chrome extension that makes streams as yt-dlp commands for downloading
JavaScript
1
star
81

bioprocs-testdata

Test data for bioprocs
1
star