• Stars
    star
    151
  • Rank 246,057 (Top 5 %)
  • Language
    Julia
  • License
    Creative Commons ...
  • Created over 3 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Bayesian Statistics using Julia and Turing

Bayesian Statistics using Julia and Turing

CC BY-SA 4.0

Bayesian for Everyone!

Bayesian for Everyone!

Welcome to the repository of tutorials on how to do Bayesian Statistics using Julia and Turing. Tutorials are available at storopoli.github.io/Bayesian-Julia.

Bayesian statistics is an approach to inferential statistics based on Bayes' theorem, where available knowledge about parameters in a statistical model is updated with the information in observed data. The background knowledge is expressed as a prior distribution and combined with observational data in the form of a likelihood function to determine the posterior distribution. The posterior can also be used for making predictions about future events.

Bayesian statistics is a departure from classical inferential statistics that prohibits probability statements about parameters and is based on asymptotically sampling infinite samples from a theoretical population and finding parameter values that maximize the likelihood function. Mostly notorious is null-hypothesis significance testing (NHST) based on p-values. Bayesian statistics incorporate uncertainty (and prior knowledge) by allowing probability statements about parameters, and the process of parameter value inference is a direct result of the Bayes' theorem.

Table of Contents

Julia

Julia is a fast dynamic-typed language that just-in-time (JIT) compiles into native code using LLVM. It "runs like C but reads like Python", meaning that is blazing fast, easy to prototype and to read/write code. It is multi-paradigm, combining features of imperative, functional, and object-oriented programming. I won't cover Julia basics and any sort of data manipulation using Julia in the tutorials, instead please take a look into the following resources which covers most of the introduction to Julia and how to work with tabular data in Julia:

  • Julia Documentation: Julia documentation is a very friendly and well-written resource that explains the basic design and functionality of the language.
  • Julia Data Science: open source and open access book on how to do Data Science using Julia.
  • Thinking Julia: introductory beginner-friendly book that explains the main concepts and functionality behind the Julia language.
  • Julia High Performance: book by two of the creators of the Julia Language (Avik Sengupta and Alan Edelman), it covers how to make Julia even faster with some principles and tricks of the trade.
  • An Introduction DataFrames: the package DataFrames.jl provides a set of tools for working with tabular data in Julia. Its design and functionality are similar to those of pandas (in Python) and data.frame, data.table and dplyr (in R), making it a great general purpose data science tool, especially for those coming to Julia from R or Python.This is a collection of notebooks that introduces DataFrames.jl made by one of its core contributors BogumiÅ‚ KamiÅ„ski.

Turing

Turing is an ecosystem of Julia packages for Bayesian Inference using probabilistic programming. Models specified using Turing are easy to read and write — models work the way you write them. Like everything in Julia, Turing is fast.

Author

Jose Storopoli, PhD - Lattes CV - ORCID - https://storopoli.io

How to use the content?

The content is licensed under a very permissive Creative Commons license (CC BY-SA). You are mostly welcome to contribute with issues and pull requests. My hope is to have more people into Bayesian statistics. The content is aimed towards social scientists and PhD candidates in social sciences. I chose to provide an intuitive approach rather than focusing on rigorous mathematical formulations. I've made it to be how I would have liked to be introduced to Bayesian statistics.

To configure a local environment:

  1. Download and install Julia
  2. Clone the repository from GitHub: git clone https://github.com/storopoli/Bayesian-Julia.git
  3. Access the directory: cd Bayesian-Julia
  4. Activate the environment by typing in the Julia REPL:
    using Pkg
    Pkg.activate(".")
    Pkg.instantiate()

Tutorials

  1. Why Julia?
  2. What is Bayesian Statistics?
  3. Common Probability Distributions
  4. How to use Turing
  5. Markov Chain Monte Carlo (MCMC)
  6. Bayesian Linear Regression
  7. Bayesian Logistic Regression
  8. Bayesian Ordinal Regression
  9. Bayesian Regression with Count Data
  10. Robust Bayesian Regression
  11. Multilevel Models (a.k.a. Hierarchical Models)
  12. Computational Tricks with Turing (Non-Centered Parametrization and QR Decomposition)
  13. Epidemiological Models using ODE Solvers in Turing

Datasets

  • kidiq (linear regression): data from a survey of adult American women and their children (a subsample from the National Longitudinal Survey of Youth). Source: Gelman and Hill (2007).
  • wells (logistic regression): a survey of 3200 residents in a small area of Bangladesh suffering from arsenic contamination of groundwater. Respondents with elevated arsenic levels in their wells had been encouraged to switch their water source to a safe public or private well in the nearby area and the survey was conducted several years later to learn which of the affected residents had switched wells. Souce: Gelman and Hill (2007).
  • esoph (ordinal regression): data from a case-control study of (o)esophageal cancer in Ille-et-Vilaine, France. Source: Breslow and Day (1980).
  • roaches (Poisson regression): data on the efficacy of a pest management system at reducing the number of roaches in urban apartments. Source: Gelman and Hill (2007).
  • duncan (robust regression): data from occupation's prestige filled with outliers. Source: Duncan (1961).
  • cheese (hierarchical models): data from cheese ratings. A group of 10 rural and 10 urban raters rated 4 types of different cheeses (A, B, C and D) in two samples. Source: Boatwright, McCulloch and Rossi (1999).

What about other Turing tutorials?

Despite not being the only Turing tutorial that exists, this tutorial aims to introduce Bayesian inference along with how to use Julia and Turing. Here is a (not complete) list of other Turing tutorials:

  1. Official Turing Tutorials: tutorials on how to implement common models in Turing
  2. Statistical Rethinking - Turing Models: Julia versions of the Bayesian models described in Statistical Rethinking Edition 1 (McElreath, 2016) and Edition 2 (McElreath, 2020)
  3. HÃ¥kan Kjellerstrand Turing Tutorials: a collection of Julia Turing models

I also have a free and opensource graduate course on Bayesian Statistics with Turing and Stan code. You can find it at storopoli/Bayesian-Statistics.

How to cite

To cite these tutorials, please use:

Storopoli (2021). Bayesian Statistics with Julia and Turing. https://storopoli.github.io/Bayesian-Julia.

Or in BibTeX format (LaTeX):

@misc{storopoli2021bayesianjulia,
  author = {Storopoli, Jose},
  title = {Bayesian Statistics with Julia and Turing},
  url = {https://storopoli.github.io/Bayesian-Julia},
  year = {2021}
}

References

The references are divided in books, papers, software, and datasets.

Books

  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian Data Analysis. Chapman and Hall/CRC.
  • McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and Stan. CRC press.
  • Gelman, A., Hill, J., & Vehtari, A. (2020). Regression and other stories. Cambridge University Press.
  • Brooks, S., Gelman, A., Jones, G., & Meng, X.-L. (2011). Handbook of Markov Chain Monte Carlo. CRC Press. https://books.google.com?id=qfRsAIKZ4rIC
    • Geyer, C. J. (2011). Introduction to markov chain monte carlo. In S. Brooks, A. Gelman, G. L. Jones, & X.-L. Meng (Eds.), Handbook of markov chain monte carlo.

Academic Papers

  • van de Schoot, R., Depaoli, S., King, R., Kramer, B., Märtens, K., Tadesse, M. G., Vannucci, M., Gelman, A., Veen, D., Willemsen, J., & Yau, C. (2021). Bayesian statistics and modelling. Nature Reviews Methods Primers, 1(1, 1), 1–26. https://doi.org/10.1038/s43586-020-00001-2
  • Gabry, J., Simpson, D., Vehtari, A., Betancourt, M., & Gelman, A. (2019). Visualization in Bayesian workflow. Journal of the Royal Statistical Society: Series A (Statistics in Society), 182(2), 389–402. https://doi.org/10.1111/rssa.12378
  • Gelman, A., Vehtari, A., Simpson, D., Margossian, C. C., Carpenter, B., Yao, Y., Kennedy, L., Gabry, J., Bürkner, P.-C., & Modr’ak, M. (2020, November 3). Bayesian Workflow. http://arxiv.org/abs/2011.01808
  • Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., Bollen, K. A., Brembs, B., Brown, L., Camerer, C., Cesarini, D., Chambers, C. D., Clyde, M., Cook, T. D., De Boeck, P., Dienes, Z., Dreber, A., Easwaran, K., Efferson, C., … Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6–10. https://doi.org/10.1038/s41562-017-0189-z
  • McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2019). Abandon Statistical Significance. American Statistician, 73, 235–245. https://doi.org/10.1080/00031305.2018.1527253
  • Amrhein, V., Greenland, S., & McShane, B. (2019). Scientists rise up against statistical significance. Nature, 567(7748), 305–307. https://doi.org/10.1038/d41586-019-00857-9
  • van de Schoot, R., Kaplan, D., Denissen, J., Asendorpf, J. B., Neyer, F. J., & van Aken, M. A. G. (2014). A Gentle Introduction to Bayesian Analysis: Applications to Developmental Research. Child Development, 85(3), 842–860. https://doi.org/10.1111/cdev.12169

Software

  • Bezanson, J., Edelman, A., Karpinski, S., & Shah, V. B. (2017). Julia: A fresh approach to numerical computing. SIAM Review, 59(1), 65–98.
  • Ge, H., Xu, K., & Ghahramani, Z. (2018). Turing: A Language for Flexible Probabilistic Inference. International Conference on Artificial Intelligence and Statistics, 1682–1690. http://proceedings.mlr.press/v84/ge18b.html
  • Tarek, M., Xu, K., Trapp, M., Ge, H., & Ghahramani, Z. (2020). DynamicPPL: Stan-like Speed for Dynamic Probabilistic Models. ArXiv:2002.02702 [Cs, Stat]. http://arxiv.org/abs/2002.02702
  • Xu, K., Ge, H., Tebbutt, W., Tarek, M., Trapp, M., & Ghahramani, Z. (2020). AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithms. Symposium on Advances in Approximate Bayesian Inference, 1–10. http://proceedings.mlr.press/v118/xu20a.html
  • Revels, J., Lubin, M., & Papamarkou, T. (2016). Forward-Mode Automatic Differentiation in Julia. ArXiv:1607.07892 [Cs]. http://arxiv.org/abs/1607.07892

Datasets

  • Boatwright, P., McCulloch, R., & Rossi, P. (1999). Account-level modeling for trade promotion: An application of a constrained parameter hierarchical model. Journal of the American Statistical Association, 94(448), 1063–1073.
  • Breslow, N. E. & Day, N. E. (1980). Statistical Methods in Cancer Research. Volume 1: The Analysis of Case-Control Studies. IARC Lyon / Oxford University Press.
  • Duncan, O. D. (1961). A socioeconomic index for all occupations. Class: Critical Concepts, 1, 388–426.
  • Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge university press.

License

This content is licensed under Creative Commons Attribution-ShareAlike 4.0 Internacional.

CC BY-SA 4.0

More Repositories

1

Bayesian-Statistics

This repository holds slides and code for a full Bayesian statistics graduate course.
Typst
323
star
2

ciencia-de-dados

Disciplina de Ciências de Dados da UNINOVE
Jupyter Notebook
120
star
3

Julia-Workshop

JuliaCon 2022 - Introduction to Julia Tutorial
Julia
60
star
4

topic-modelling

Handy Jupyter Notebooks that I use in for Topic Modeling. Including text mining from PDF files, text preprocessing, Latent Dirichlet Allocation (LDA), hyperparameters grid search and Topic Modeling visualiation.
Jupyter Notebook
35
star
5

Turing-Workshop

DEPRECATED IN FAVOR OF TuringLang/Turing-Workshop
HTML
34
star
6

flakes

NixOS/MacOS Nix Minimalist-Hardened-Privacy-oriented Configs
Nix
22
star
7

Estatistica-Bayesiana

Disciplina de Estatística Bayesiana da UNINOVE
R
15
star
8

Computacao-Cientifica

Disciplina de Computação Científica com Julia
Julia
15
star
9

Why-Julia

Why Julia? A Gentle Pitch
10
star
10

dead-man-switch

Rust no-BS Dead Man's Switch TUI
Rust
8
star
11

Linguagem-R

Disciplina de Linguagem R para Ciência de Dados de Pós-Graduação da UNINOVE
CSS
8
star
12

cryptography-workshop

Cryptographic Signature Workshop
Typst
8
star
13

Solving-Captchas

Tensorflow Keras CNN to solve Captchas
Python
4
star
14

stoic-quotes

Stoic Quotes
Rust
4
star
15

sudoku

Sudoku PWA
Rust
4
star
16

Estatistica

Tutorial de R da Disciplina de Estatística da UNINOVE
R
3
star
17

R_Scripts

Couple of handy R Scripts that I use in a daily basis for Scientific Research
R
3
star
18

EmailScraper.jl

Scrape Emails from Domains
Julia
3
star
19

cmdstanr-docker

OCI (Docker/Podman) images for CmdStanR
Dockerfile
3
star
20

Julia_Scripts

Handy Julia Scripts
Julia
2
star
21

FactorAssumptions

R Package for a Set of Assumptions for Factor and Principal Component Analysis
R
2
star
22

storopoli.github.io

Personal Website
Nix
2
star
23

neovix

nixvim anywhere configs
Nix
2
star
24

prod_lattes

Script in Python to mine CV Lattes Brazilian information from researchers and then compare it with the Qualis Score
Jupyter Notebook
1
star
25

word-rnn-tensorflow-LoTR

Using Word RNN in TF with LoTR books
Python
1
star
26

bibjoin

Combine TSV and CSV files from Scopus/Web of Science by DOI
Rust
1
star
27

rustlings

Rust
1
star
28

storopoli

1
star
29

cmdstanpy-docker

OCI (Docker/Podman) images for CmdStanPy
Dockerfile
1
star
30

bibexcel

A python executable to get JSON and Excel data from a bibexcel text file
Python
1
star
31

EDC

Everyday Carry for Linux Users
Dockerfile
1
star
32

btc-addr

A simple command line tool to generate Bitcoin addresses from a XPUB
Rust
1
star
33

udacity-plagiarism-detector

Udacity's ML Engineer Nanodegree - Project 2 - Deploy a Plagiarism Detector Model
Jupyter Notebook
1
star
34

update-podman

A simple Rust CLI to update podman images.
Rust
1
star
35

Rcpp

Como fazer seu código R ficar mais rápido com Rcpp
R
1
star
36

dotfiles

Codespaces Dotfiles
Lua
1
star
37

nvim

My minimalist NeoVim configs
Lua
1
star
38

TF-Deploy

Deploy a Pre-trained TensorFlow model with the help of TensorFlow Serving with Docker. Create a visual web interface using Flask web framework which will serve to get predictions from the served TensorFlow model and help end-users to consume through API calls.
Python
1
star
39

dead-man-switch-startos

Dead Man Switch StartOS Package
TypeScript
1
star