• Stars
    star
    103
  • Rank 331,104 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.

Version Build status Code coverage Support Python versions

weightedcalcs

weightedcalcs is a pandas-based Python library for calculating weighted means, medians, standard deviations, and more.

Features

  • Plays well with pandas.
  • Support for weighted means, medians, quantiles, standard deviations, and distributions.
  • Support for grouped calculations, using DataFrameGroupBy objects.
  • Raises an error when your data contains null-values.
  • Full test coverage.

Installation

pip install weightedcalcs

Usage

Getting started

Every weighted calculation in weightedcalcs begins with an instance of the weightedcalcs.Calculator class. Calculator takes one argument: the name of your weighting variable. So if you're analyzing a survey where the weighting variable is called "resp_weight", you'd do this:

import weightedcalcs as wc
calc = wc.Calculator("resp_weight")

Types of calculations

Currently, weightedcalcs.Calculator supports the following calculations:

  • calc.mean(my_data, value_var): The weighted arithmetic average of value_var.
  • calc.quantile(my_data, value_var, q): The weighted quantile of value_var, where q is between 0 and 1.
  • calc.median(my_data, value_var): The weighted median of value_var, equivalent to .quantile(...) where q=0.5.
  • calc.std(my_data, value_var): The weighted standard deviation of value_var.
  • calc.distribution(my_data, value_var): The weighted proportions of value_var, interpreting value_var as categories.
  • calc.count(my_data): The weighted count of all observations, i.e., the total weight.
  • calc.sum(my_data, value_var): The weighted sum of value_var.

The obj parameter above should one of the following:

  • A pandas DataFrame object
  • A pandas DataFrame.groupby object
  • A plain Python dictionary where the keys are column names and the values are equal-length lists.

Basic example

Below is a basic example of using weightedcalcs to find what percentage of Wyoming residents are married, divorced, et cetera:

import pandas as pd
import weightedcalcs as wc

# Load the 2015 American Community Survey person-level responses for Wyoming
responses = pd.read_csv("examples/data/acs-2015-pums-wy-simple.csv")

# `PWGTP` is the weighting variable used in the ACS's person-level data
calc = wc.Calculator("PWGTP")

# Get the distribution of marriage-status responses
calc.distribution(responses, "marriage_status").round(3).sort_values(ascending=False)

# -- Output --
# marriage_status
# Married                                0.425
# Never married or under 15 years old    0.421
# Divorced                               0.097
# Widowed                                0.046
# Separated                              0.012
# Name: PWGTP, dtype: float64

More examples

See this notebook to see examples of other calculations, including grouped calculations.

Max Ghenis has created a version of the example notebook that can be run directly in your browser, via Google Colab.

Weightedcalcs in the wild

Other Python weighted-calculation libraries

More Repositories

1

pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
Python
6,285
star
2

markovify

A simple, extensible Markov chain generator.
Python
3,292
star
3

waybackpack

Download the entire Wayback Machine archive for a given URL.
Python
2,841
star
4

nbpreview

Render Jupyter/IPython notebooks without running a notebook server.
CSS
289
star
5

notebookjs

Render Jupyter/IPython notebooks on the fly, in the browser. (Or on the command line, if you'd like.)
JavaScript
272
star
6

spectra

Easy color scales and color conversion for Python.
Python
257
star
7

envplus

Combine your Python virtualenvs.
Python
115
star
8

reporter

Literate data analysis with iPython notebooks and Jekyll.
Ruby
92
star
9

twick

Twitter, quick. Fetch and store tweets on short notice.
Python
80
star
10

intro-to-visidata

Source files for "An Introduction to VisiData"
HTML
70
star
11

visidata-plugins

A place for me to share VisiData plugins I've written.
Python
36
star
12

mplstyle

A simple API for setting matplotlib styles, as well as a repository of nice styles.
Python
32
star
13

visidata-cheat-sheet

A one-page cheat sheet for VisiData, available in multiple languages.
HTML
26
star
14

gekyll

A Jekyll plugin for using Git repositories as posts, giving you access to a post's commits, diffs, and more.
Ruby
25
star
15

nbexec

A dead-simple tool for executing Jupyter notebooks from the command line.
Python
20
star
16

Backbone.Table

Render any Backbone.js Collection as an HTML table.
JavaScript
20
star
17

buzzfeed-news-trending-strip

Dataset: BuzzFeed News “Trending” Strip, 2018–2023
Python
19
star
18

tab-bankrupter

A Chrome extension for declaring "tab bankruptcy" without losing all your links.
JavaScript
18
star
19

astronomer

Fetch information about the users who've starred a given GitHub repository.
Python
17
star
20

txtbirds

‾‾\/‾‾
JavaScript
14
star
21

tinyapi

Python wrapper around TinyLetter's publicly accessible — but undocumented — API.
Python
13
star
22

fbpagefeed

A library and command-line tool for fetching Facebook Pages' published posts.
Python
12
star
23

virtualenv-recipes

Recipes for useful Python virtualenvs.
Shell
12
star
24

data-tactics

Half-baked idea: Conceptual building blocks for data analysis.
11
star
25

tinystats

Command-line tool for fetching message, URL, and subscriber data for the TinyLetter newsletters you own.
Python
11
star
26

vinejs

Somewhere between a total joke and a useful library for fetching Vine.co videos.
JavaScript
11
star
27

nicar-2024-pdfplumber-workshop

Jupyter Notebook
11
star
28

mta-colors

CSS & JSON files to help developers use the official colors of New York's Metropolitan Transportation Authority.
CSS
10
star
29

compleat

Fetch autocomplete suggestions from Google Search.
Python
9
star
30

google-table-converter

A browser-based tool for converting Google Spreadsheets into responsive HTML <table>s.
HTML
9
star
31

lede-2023

Jupyter Notebook
8
star
32

gifparse

[Work in progress.] Parse the GIF 89a file format, down to the minor details. Pure Python, no dependencies.
Python
8
star
33

nicar-2015-schedule

NICAR 2015 conference schedule as CSV and JSON, plus the underlying Python scraper.
Python
8
star
34

WRIT1-CE9741

WRIT1-CE9741, Fall 2013, NYU School of Continuing and Professional Studies
Ruby
6
star
35

nicar-2023-pdfplumber-workshop

Jupyter Notebook
6
star
36

csvcat

Efficiently concatenate CSVs (or other tabular text files), stripping extra header lines.
Shell
6
star
37

nicar-2017-schedule

NICAR 2017 conference schedule as JSON and CSV, plus the underlying Python scraper.
Python
6
star
38

babynames

CSVs and parsers for the Social Security Administration's historical baby name data.
Python
5
star
39

minicard

A bare-bones CSS stylesheet for creating "card"-style elements.
CSS
4
star
40

macmailer

Command-line utility and Ruby library for creating/sending messages in OSX's Mail.app program.
Ruby
4
star
41

nicar-now

Your unofficial guide to what's happening next at NICAR 2020.
3
star
42

text-toggle

Let readers toggle between two versions of a text.
JavaScript
3
star
43

fidget

Fidget.js is a small, configurable JavaScript library that resizes blocks of text to fit their containers.
JavaScript
3
star
44

statusfiles

IDEA: A simple, structured, standardized, technology-agnostic way to represent the status of things.
3
star
45

nicar-2018-schedule

Your unofficial guide to what's happening next at NICAR 2018.
Python
3
star
46

glat-glong

Find the precise latitude and longitude of any point on Google Maps. A Chrome extension.
JavaScript
3
star
47

lede-2024

Jupyter Notebook
3
star
48

gmap-button

A JavaScript library for adding buttons to embedded Google Maps.
JavaScript
2
star
49

crochet

Hook into and/or monkeypatch any Ruby class- or instance-method. Provides 'before' and 'after' hooks, plus their destructive evil twins.
Ruby
2
star
50

jub

As in, "get the jub done." Or as in, "jQuery, Underscore, Backbone." It's a shell script that automatically grabs the latest versions of those libraries, so that you can get on with prototyping.
Shell
2
star
51

download-all-attachments-from-a-gmail-conversation

Two methods that *seem* to work...
1
star
52

fbiter

A simple library for iterating through paginated Facebook API endpoints.
Python
1
star
53

weddingroulette

The code behind http://weddingroulette.com/
Ruby
1
star
54

jekyll-auto-s3

Automatically sync your Jekyll project to S3 on every (re)build.
Ruby
1
star
55

griddle

Griddle.js is lightweight tool for creating and manipulating programmable, fluid, shift-able grids.
JavaScript
1
star
56

linstapaper

Article-list and site files for linstapaper.com
JavaScript
1
star
57

nbtemplate

Render iPython notebooks to other layouts, via templates. Library and command-line tool.
Python
1
star
58

nicar-2019-schedule

The NICAR 2019 conference schedule as JSON and CSV files, plus the underlying Python scraper.
Python
1
star
59

parabear

An experiment in stupid-simple HTML article text extraction.
JavaScript
1
star