• Stars
    star
    1,443
  • Rank 31,551 (Top 0.7 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 8 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Introduction to Statistics and Basics of Mathematics for Data Science - The Hacker's Way

HackerMath for Machine Learning

“Study hard what interests you the most in the most undisciplined, irreverent and original manner possible.” ― Richard Feynman

Math literacy, including proficiency in Linear Algebra and Statistics,is a must for anyone pursuing a career in data science. The goal of this workshop is to introduce some key concepts from these domains that get used repeatedly in data science applications. Our approach is what we call the “Hacker’s way”. Instead of going back to formulae and proofs, we teach the concepts by writing code. And in practical applications. Concepts don’t remain sticky if the usage is never taught.

The focus will be on depth rather than breadth. Three areas are chosen - Hypothesis Testing, Supervised Learning and Unsupervised Learning. They will be covered to sufficient depth - 50% of the time will be on the concepts and 50% of the time will be spent coding them.

More details at http://amitkaps.com/hackermath

See it in action: Binder

Module #1: Hypothesis Testing

Math Concepts

  • Basic Metrics: Mean, Variance, Covariance, Correlation
  • Discrete Probability Distributions: Bernoulli, Binomial
  • Cumulative Mass Function, Probability Mass Function
  • Continuous Probability Distributions: Poisson, Uniform, Normal, Beta, Gamma
  • Cumulative Distribution Function, Probability Density Function

ML Applications

  • Direct Simulation
  • Shuffling
  • Bootstrapping
  • Application to A/B Testing

Module #2: Supervised Learning

Math Concepts

  • Basics of Matrix Operation
  • Matrix Determinant, Inverse
  • Basics of Linear Algebra
  • Solve for Ax=b for nxn
  • Solve for Ax=b for nxp+1

ML Applications

  • Linear Regression
  • L2 Regularization
  • Gradient Descent
  • Linear Classifier
  • Logistic Regression

Module #3: Unsupervised Learning

Math Concepts

  • Matrix Projections
  • Solve for Ax=λx for nxn
  • Eigenvectors & Eigenvalues
  • Distance in Vector Space

ML Applications

  • Dimensionality Reduction
  • Principle Component Analysis
  • Cluster Analysis

Target Audience

  • Someone with a background in programming who wants to pick the math needed for data science and get a flavor for different data science problems
  • Someone who is a beginner in data science or has been doing data analysis (at least using Excel at a minimum) and wants to pick skills to take the next step in their data science career

Pre-requisites

  • Having a basic understanding of linear algebra would help. And we know you may have forgotten all about it from your school or college days. So here is an amazing video playlist by @3blue1brown to learn The Essence of Linear Algebra in a very visual way.
  • Also, a touch of calculus knowledge would make it also easier. So if you want to brush up your basic calculus skills, then @3blue1brown has another amazing video playlist to learn The Essence of Calculus in a very visual way.
  • Programming knowledge is mandatory. You should, at the bare minimum, be able to write conditional statements, use loops, be comfortable writing functions and be able to understand code snippets and come up with programming logic. Since we will be using Python - brush up your basics there. Specifically, we expect you to know the first three sections from this: http://anandology.com/python-practice-book/

Software Requirements

You will require the Python data stack for the workshop. Please install Ananconda for Python 3.5 for the workshop. That has everything we need for the workshop. For attendees more curious, we will be using Jupyter Notebook as our IDE. We will be introducing numpy, scipy, seaborn, matplotlib, plotnine, statsmodel and scikit-learn.

The working repo for this workshop is at https://github.com/amitkaps/hackermath/


Authors:

Amit Kapoor

Bargava Subramanian

More Repositories

1

visdown

Visualisation Markdown
JavaScript
659
star
2

recommendation

Recommendation System using ML and DL
Jupyter Notebook
417
star
3

full-stack-data-science

Full Stack Data Science in Python
Jupyter Notebook
255
star
4

weed

Analysing Weed Pricing across US - Data Analysis Workshop
HTML
128
star
5

deep-learning

Deep Learning Bootcamp
Jupyter Notebook
59
star
6

applied-machine-learning

Applied Machine Learning @ http://amitkaps.com/ml
Jupyter Notebook
37
star
7

art-data-science

The Art of Data Science
HTML
34
star
8

text-mining

Text Mining in Python
Jupyter Notebook
23
star
9

machine-learning

Workshop on Machine Learning in Python
HTML
19
star
10

multidim

Visualising Multi Dimensional Data
Jupyter Notebook
18
star
11

datascience

Build and Deploy Machine Learning Models on the Cloud
Jupyter Notebook
17
star
12

modelvis-talks

Model Visualisation.
Jupyter Notebook
16
star
13

pandas-workshop

Introduction to data analysis using Pandas
Jupyter Notebook
13
star
14

ensemble

Ensemble Approach for Machine Learning
Jupyter Notebook
8
star
15

recoflow

Recommender System for Humans
Python
7
star
16

learn-d3

Learning d3.js for data visualisation
HTML
5
star
17

djembeviz

Visualising Djembe to Learn Music.
JavaScript
5
star
18

DataSciencePython

Introduction to Data Science in Python
Jupyter Notebook
3
star
19

proposals

Proposal submissions for Talks and Tutorials at Conferences
3
star
20

dsVis

Data Visualisation for Data Science
Jupyter Notebook
3
star
21

data-vis-workshop

Data Visualisation Workshop
HTML
2
star
22

trees

Tree-based Model [Random Forest and Gradient Boosting]
Jupyter Notebook
2
star
23

modelvis

Model Visualisation
Python
2
star
24

beats1

Visualising Radio Plays by Beats1
JavaScript
2
star
25

mlops

Machine Learning Operations
1
star
26

visual-analytics

Visual Analytics and Data Visualisation
1
star
27

artistry

Generative Visualisation
JavaScript
1
star
28

svm

Support Vector Machines
Jupyter Notebook
1
star
29

interactive

Interactive Data Visualisation
JavaScript
1
star
30

data-vis-python

Data Visualisation in Python
Jupyter Notebook
1
star
31

onion

Visualising Onion Price in India
HTML
1
star
32

deep-learning-rorodata

Get started with deep learning workshop @ rorodata
1
star
33

cars

Visualising Cars in India
1
star
34

onions-dataset

Onions Price Dataset in India
HTML
1
star
35

workshop-av-2018

Analytics Vidhya 2018 - Applied Machine Learning
Jupyter Notebook
1
star