• Stars
    star
    167
  • Rank 226,635 (Top 5 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 7 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Introduction to Statistical Modeling with Python (PyCon 2017)

Introduction to Statistical Modeling with Python

Binder

Wednesday, May 17, 2017 1:20 p.m.โ€“4:40 p.m.

Christopher Fonnesbeck
Vanderbilt University Medical Center

Description

This intermediate-level tutorial will provide students with hands-on experience applying practical Bayesian statistical modeling methods on real data. Unlike many introductory statistics courses, we will not be applying "cookbook" methods that are easy to teach, but often inapplicable; instead, we will learn some foundational statistical methods that can be applied generally to a wide variety of problems: comparing two groups with continuous and binary outcomes, linear regression, generalized linear models, model selection, and other useful techniques. The tutorial will start with a short introduction on data manipulation and cleaning using pandas, before proceeding on to simple concepts like fitting data to statistical distributions, and how to use optimization and simulation for data analysis. By using and modifying hand-coded implementations of these techniques, students will gain an understanding of how each method works. Students will come away with knowledge of how to deal with very practical statistical problems, such as how to deal with missing data, how to check a statistical model for appropriateness, and how to properly express the uncertainty in the quantities estimated by statistical methods.

Outline

  1. Introduction (1:20-1:40)
  2. Data cleaning with Pandas (1:40 - 2:10)
  3. Basic Bayesian inference (2:10 - 3:10)
  4. Fitting Regression models (3:15 - 4:15)
  5. Dealing with missing data (4:15 - 4:40)

Software requirements

This tutorial is based on Python 3. I cannot guarantee that the materials will work well with Python 2 or earlier, so ensure Python 3.5 or greater is installed on your system.

I recommend installing the Anaconda distribution of Python 3, as it allows for the easy automation of package installation and virtual environment creation (see instructions below).

Getting the tutorial materials

Clone this repository into a directory of your choice.

git clone https://github.com/fonnesbeck/intro_stat_modeling_2017.git

If you are not familiar with Git and GitHub, you can simply download the zip file of the repository at the top of the main repository page.

Then, move to the directory created by the clone/zip file:

cd intro_stat_modeling_2017

and install everything using conda:

conda config --add channels conda-forge
conda env create -f environment.yml

This will create an environment called stat_pycon that includes the packages required for the course.

If you are not using the Anaconda Python distribution, you will need to manually install the packages listed in environment.yml using pip.

Which you probably don't want to do.

So install Anaconda.

To use the environment, you may type:

source activate stat_pycon

More Repositories

1

statistical-analysis-python-tutorial

Statistical Data Analysis in Python
HTML
1,574
star
2

Bios8366

Advanced Statistical Computing at Vanderbilt University Medical Center's Department of Biostatistics
Jupyter Notebook
534
star
3

ScipySuperpack

Recent builds of Numpy, Scipy, Matplotlib, iPython and PyMC for OSX
Shell
490
star
4

scipy2014_tutorial

Tutorial: Bayesian Statistical Analysis in Python
Jupyter Notebook
313
star
5

Bayes_Computing_Course

Jupyter Notebook
230
star
6

gp_regression

A Primer on Gaussian Processes for Regression Analysis (PyData NYC 2019)
Jupyter Notebook
164
star
7

mcmc_pydata_london_2019

PyData London 2019 Tutorial on Markov chain Monte Carlo with PyMC3
Jupyter Notebook
153
star
8

pytenn2014_tutorial

PyTennessee 2014: Statistical Data Analysis in Python
85
star
9

Bios6301

Biostatistics 301: Introduction to Statistical Computing
R
78
star
10

PyMC3_DataScienceLA

PyMC3 tutorial for DataScience LA (January 2017)
Jupyter Notebook
68
star
11

probabilistic_python

PyData London 2022 Tutorial
Jupyter Notebook
64
star
12

multilevel_modeling

Tutorial on multilevel modeling, using Gelman radon example
CSS
55
star
13

stan_workshop_2016

Bayesian Modeling using Stan in R (May/June 2016)
Jupyter Notebook
52
star
14

scipy2015_tutorial

Computational Statistics II Tutorial at SciPy 2015
Python
47
star
15

bayes_tutorial_2019

Introductory overview of Bayesian inference
Jupyter Notebook
44
star
16

PyMC3_Oslo

Probabilistic programming in Python workshop at Oslo universitetssykehus HF
Jupyter Notebook
36
star
17

bayes_course_2022

Probabilistic Programming and Bayesian Computing with PyMC
Jupyter Notebook
27
star
18

pymc_tutorial

PyMC Tutorial for SciPy 2011
Python
27
star
19

enar_2019_tutorial

A Primer on Python for Statistical Programming and Data Science
Jupyter Notebook
26
star
20

ComputationalMethodsCourse

iPython notebook for Computational Methods for Data Analysis course on Coursera
24
star
21

bayes_course_dec_2023

Probabilistic Programming and Bayesian Computing with PyMC
Jupyter Notebook
24
star
22

PyMC3_EUSS

Course in Probabilistic Programming in Python for the 2018 EU Summer School
Jupyter Notebook
24
star
23

bayes_course_july2020

Course materials for short course on Bayesian computation
Jupyter Notebook
23
star
24

ngcm_pandas_2017

Python data analysis course for 2017 NGCM Summer Academy
Jupyter Notebook
19
star
25

bayes_pydata_london_2024

Probabilistic Programming and Bayesian Computing with PyMC
Jupyter Notebook
18
star
26

ngcm_sklearn_2017

scikit-learn course for 2017 NGCM Summer Academy
Jupyter Notebook
17
star
27

scientific-python-workshop

Scientific Python Programming Workshop, April 2016 Australia
Jupyter Notebook
16
star
28

hierarchical_models_sports_analytics

Developing Hierarchical Models for Sports Analytics
Jupyter Notebook
15
star
29

gp_tutorial_pydata

PyData San Luis 2017 Tutorial: An Introduction to Gaussian Processes in PyMC3
Jupyter Notebook
15
star
30

ngcm_pandas_course

Python data analysis course for 2015 NGCM Summer Academy
Python
14
star
31

bayes_course_june_2024

Probabilistic Programming and Bayesian Computing with PyMC
Jupyter Notebook
11
star
32

StatisticalLearningInPython

Implementing Hastie and Tibshirani's Course in Python
10
star
33

bayesian_mixer_london_2017

Fitting Gaussian process models in PyMC3: Bayesian Mixer London 2017 seminar
Jupyter Notebook
10
star
34

election_pycast

PyMC3 implementation of Drew Linzerโ€™s dynamic Bayesian election forecasting model
Jupyter Notebook
10
star
35

pymc_workshop

One-day workshop on probabilistic programming with PyMC
Jupyter Notebook
10
star
36

NCTC_course

Markov Decision Processes and Dynamic Optimization module at NCTC, March 2015
CSS
8
star
37

intro_to_pandas

A short introductory workshop on Pandas for applied users
Jupyter Notebook
8
star
38

gp_showdown

A comparison of Gaussian process fitting packages in Python
Jupyter Notebook
7
star
39

jupyter_for_reproducible_research

Jupyter for Reproducible Research
Jupyter Notebook
7
star
40

bayes_mixer_2023

London Bayes Mixer presentation, June 2023
Jupyter Notebook
6
star
41

cqs_machine_learning

2018 CQS Summer Institute course in machine learning
Jupyter Notebook
6
star
42

dqn_rl_outbreak_response

Deep Q-learning for Disease Outbreak Decision Modeling
Python
6
star
43

pymc_sdss_2024

SDSS 2024 Course: Probabilistic Programming and Bayesian Computing with PyMC
Jupyter Notebook
6
star
44

baseball

Baseball data analysis in Python
Jupyter Notebook
5
star
45

basic_bayes

Basic Bayesian analysis for comparing two groups with continuous and binary outcomes
Jupyter Notebook
5
star
46

tensorflow_demo

Quick tutorial on neural networks and TensorFlow
Jupyter Notebook
5
star
47

git_tutorial

Slides for SWC git lecture
4
star
48

useRshootout

useR session on comparing statistical computing languages
R
3
star
49

bmi_python_tutorial

Brief Python tutorial for Vanderbilt Biomedical Informatics big data class
Jupyter Notebook
3
star
50

framingham_risk

Functions for calculating the Framingham Risk Score (FRS)
Python
3
star
51

bimodal-bilateral

Outcomes in Children with Bilateral Cochlear implants and Bimodal Hearing
3
star
52

plotly_bayes

Bayesian analysis for Python with Plotly graphics
3
star
53

bayesball

Probabilistic models for the analysis of baseball data
3
star
54

fonnesbeck.github.io

Strong Inference website
HTML
3
star
55

bayes_course_dec_2024

Probabilistic Programming and Bayesian Computing with PyMC
Jupyter Notebook
3
star
56

ngcm_pandas_2016

Python data analysis course for 2016 NGCM Summer Academy
Jupyter Notebook
2
star
57

CharlestonLanesAnalysis

Development of shipping lanes recommendations for Port of Charleston based on right whale activity
HTML
2
star
58

ebola_data_processing

Example of importing and cleaning external data
Jupyter Notebook
2
star
59

bootcamp_python

Python files for Vanderbilt University Software Carpentry Bootcamp
Python
2
star
60

autism_intervention_MA

Meta-analysis of Autism Intervention Effectiveness
Jupyter Notebook
2
star
61

dbmi_seminar_2018

VUMC Seminar Series talk, October 2018
2
star
62

HealthPolicyPython

Python programming workshop for Vanderbilt's Department of Health Policy, December 16, 2015
Jupyter Notebook
2
star
63

jupyter-ds

Docker containers for serving up Jupyter
Dockerfile
2
star
64

disruptive_behavior_disorder_MA

Meta-analysis of psychosocial interventions for disruptive behavior disorder (DBD)
2
star
65

CDRN_Obesity

Mid-south CDRN Obesity Project
Jupyter Notebook
1
star
66

SDM_Tools

Decision Analysis Tools Course (March 2015)
1
star
67

bayesian_marcel

Bayesian implementation of Tango's MARCEL projection system
Jupyter Notebook
1
star
68

neurips_2018_talk

PyMC's Big Adventure (MLOSS Workshop 2018)
CSS
1
star
69

stronginference

Pelican CMS for Strong Inference
Makefile
1
star
70

python_and_r

Draft book chapters
1
star
71

git_training

Training repository for Bios301 students
1
star
72

autism_screening

Meta-analysis of autism screening tools
1
star
73

PKUMetaAnalysis

Vanderbilt EPC meta-analysis on PKU supplementary materials
TeX
1
star
74

pmi_example

Example models for Precision Medicine Initiative planning
Jupyter Notebook
1
star
75

mbsr_intervention_study

Mindfulness-based stress reduction (MBSR) intervention study for autism outcomes
Jupyter Notebook
1
star
76

sdss_2024_course

SDSS 2024 Course: Probabilistic Programming and Bayesian Computing with PyMC
1
star
77

AHRQ_Complex_Interventions

AHRQ Tools for Systematic Reviews of Complex Interventions (Bayesian Inference)
CSS
1
star
78

CCASAnetRCourse

Support materials for CCASAnet's R short course
1
star
79

mongolia_measles

Mongolia measles outbreak intervention modeling
Jupyter Notebook
1
star