• Stars
    star
    128
  • Rank 281,044 (Top 6 %)
  • Language
    HTML
  • License
    MIT License
  • Created over 9 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Analysing Weed Pricing across US - Data Analysis Workshop

Weed

This is a repository for a data analytics workshop in python to be conducted in August. We will be using the price of weed in the US as the dataset to showcase the approach.

The broad analytics steps are listed below. We would showcase some of them in this workshop.

  1. Introduction - β€œI think, therefore I am”
  • What is data analysis?
  • What type of questions can be answered?
  • Developing a hypothesis drive approach.
  • Making the case.
  1. Acquire - "Data is the new oil"
  • Download from an internal system
  • Obtained from client, or other 3rd party
  • Extracted from a web-based API
  • Scraped from a website
  • Extracted from a PDF file
  • Gathered manually and recorded
  1. Refine - "Data is messy"
  • Missing e.g. Check for missing or incomplete data
  • Quality e.g. Check for duplicates, accuracy, unusual data
  • Parse e.g. extract year from date
  • Merge e.g. first and surname for full name
  • Convert e.g. free text to coded value
  • Derive e.g. gender from title
  • Calculate e.g. percentages, proportion
  • Remove e.g. remove redundant data
  • Aggregate e.g. rollup by year, cluster by area
  • Filter e.g. exclude based on location
  • Sample e.g. extract a representative data
  • Summary e.g. show summary stats like mean
  1. Explore - "I don't know, what I don't know"
  • Why do visual exploration?
  • Understand Data Structure & Types
  • Explore single variable graphs - (Quantitative, Categorical)
  • Explore dual variable graphs - (Q & Q, Q & C, C & C)
  • Explore multi variable graphs
  1. Model - "All models are wrong, Some of them are useful"
  • The power and limits of models
  • Tradeoff between Prediction Accuracy and Model Interpretability
  • Assessing Model Accuracy
  • Regression models (Simple, Multiple)
  • Classification model
  1. Insight - β€œThe goal is to turn data into insight”
  • Why do we need to communicate insight?
  • Types of communication - Exploration vs. Explanation
  • Explanation: Telling a story with data
  • Exploration: Building an interface for people to find stories

Prerequisites

  • Basics of Python. User should know how to write functions; read in a text file(csv, txt, fwf) and parse them; conditional and looping constructs; using standard libraries like os, sys; lists, list comprehension, dictionaries
  • It is good to know basics of the following:
    • Numpy
    • Scipy
    • Pandas
    • Matplotlib
    • Seaborn
    • bokeh
    • vincent
    • folium
    • sklearn
    • IPython and IPython notebook - Everything here would be an IPython notebook
  • Software Requirements
    • Python 2.7
    • git - so that this repo can be cloned :)
    • virtualenv
    • Libraries from requirements.txt

Optional

Users could choose to install Anaconda, if they want. If using Anaconda or Enthought, please ensure that all libraries listed in the requirements.txt are installed.

Setup Guide

####Clone the repository $ git clone https://github.com/amitkaps/weed.git

####Create a virtual environment & activate $ cd weed $ virtualenv env $ source env/bin/activate

####Install reqirements from requirements file $ pip install -r requirements.txt

####Note: Make sure you have libraries for png & freetype. Ubuntu users can install the below

apt-get install libfreetype6-dev
apt-get install libpng-dev

Mac users

brew install freetype
brew install libpng

Creative Commons License
Introduction to Data Analysis using Python by Amit Kapoor , Bargava and Nischal is licensed under a Creative Commons Attribution 4.0 International License.

More Repositories

1

hackermath

Introduction to Statistics and Basics of Mathematics for Data Science - The Hacker's Way
Jupyter Notebook
1,443
star
2

visdown

Visualisation Markdown
JavaScript
659
star
3

recommendation

Recommendation System using ML and DL
Jupyter Notebook
450
star
4

full-stack-data-science

Full Stack Data Science in Python
Jupyter Notebook
257
star
5

deep-learning

Deep Learning Bootcamp
Jupyter Notebook
62
star
6

applied-machine-learning

Applied Machine Learning @ http://amitkaps.com/ml
Jupyter Notebook
37
star
7

art-data-science

The Art of Data Science
HTML
34
star
8

text-mining

Text Mining in Python
Jupyter Notebook
23
star
9

machine-learning

Workshop on Machine Learning in Python
HTML
19
star
10

multidim

Visualising Multi Dimensional Data
Jupyter Notebook
18
star
11

datascience

Build and Deploy Machine Learning Models on the Cloud
Jupyter Notebook
17
star
12

modelvis-talks

Model Visualisation.
Jupyter Notebook
16
star
13

pandas-workshop

Introduction to data analysis using Pandas
Jupyter Notebook
13
star
14

ensemble

Ensemble Approach for Machine Learning
Jupyter Notebook
8
star
15

recoflow

Recommender System for Humans
Python
7
star
16

learn-d3

Learning d3.js for data visualisation
HTML
5
star
17

djembeviz

Visualising Djembe to Learn Music.
JavaScript
5
star
18

DataSciencePython

Introduction to Data Science in Python
Jupyter Notebook
3
star
19

dsVis

Data Visualisation for Data Science
Jupyter Notebook
3
star
20

proposals

Proposal submissions for Talks and Tutorials at Conferences
3
star
21

data-vis-workshop

Data Visualisation Workshop
HTML
2
star
22

modelvis

Model Visualisation
Python
2
star
23

trees

Tree-based Model [Random Forest and Gradient Boosting]
Jupyter Notebook
2
star
24

beats1

Visualising Radio Plays by Beats1
JavaScript
2
star
25

mlops

Machine Learning Operations
1
star
26

visual-analytics

Visual Analytics and Data Visualisation
1
star
27

artistry

Generative Visualisation
JavaScript
1
star
28

svm

Support Vector Machines
Jupyter Notebook
1
star
29

interactive

Interactive Data Visualisation
JavaScript
1
star
30

data-vis-python

Data Visualisation in Python
Jupyter Notebook
1
star
31

onion

Visualising Onion Price in India
HTML
1
star
32

deep-learning-rorodata

Get started with deep learning workshop @ rorodata
1
star
33

cars

Visualising Cars in India
1
star
34

onions-dataset

Onions Price Dataset in India
HTML
1
star
35

workshop-av-2018

Analytics Vidhya 2018 - Applied Machine Learning
Jupyter Notebook
1
star