• Stars
    star
    516
  • Rank 82,705 (Top 2 %)
  • Language
    Jupyter Notebook
  • License
    Creative Commons ...
  • Created about 8 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Scikit-learn tutorial at SciPy2016

SciPy 2016 Scikit-learn Tutorial

Based on the SciPy 2015 tutorial by Kyle Kastner and Andreas Mueller.

Instructors


The video recording of the tutorial is now available via YouTube:


This repository will contain the teaching material and other info associated with our scikit-learn tutorial at SciPy 2016 held July 11-17 in Austin, Texas.

Parts 1 to 5 make up the morning session, while parts 6 to 9 will be presented in the afternoon.

Schedule:

The 2-part tutorial will be held on Tuesday, July 12, 2016.

  • Parts 1 to 5: 8:00 AM - 12:00 PM (Room 105)
  • Parts 6 to 9: 1:30 PM - 5:30 PM (Room 105)

(You can find the full SciPy 2016 tutorial schedule here.)

Obtaining the Tutorial Material

If you have a GitHub account, it is probably most convenient if you fork the GitHub repository. If you donโ€™t have an GitHub account, you can download the repository as a .zip file by heading over to the GitHub repository (https://github.com/amueller/scipy-2016-sklearn) in your browser and click the green โ€œDownloadโ€ button in the upper right.

Please note that we may add and improve the material until shortly before the tutorial session, and we recommend you to update your copy of the materials one day before the tutorials. If you have an GitHub account and forked/cloned the repository via GitHub, you can sync your existing fork with via the following commands:

git remote add upstream https://github.com/amueller/scipy-2016-sklearn.git
git fetch upstream
git checkout master merge upstream/master

If you donโ€™t have a GitHub account, you may have to re-download the .zip archive from GitHub.

Installation Notes

This tutorial will require recent installations of

The last one is important, you should be able to type:

jupyter notebook

in your terminal window and see the notebook panel load in your web browser. Try opening and running a notebook from the material to see check that it works.

For users who do not yet have these packages installed, a relatively painless way to install all the requirements is to use a Python distribution such as Anaconda CE, which includes the most relevant Python packages for science, math, engineering, and data analysis; Anaconda can be downloaded and installed for free including commercial use and redistribution. The code examples in this tutorial should be compatible to Python 2.7, Python 3.4, and Python 3.5.

After obtaining the material, we strongly recommend you to open and execute the Jupyter Notebook jupter notebook check_env.ipynb that is located at the top level of this repository. Inside the repository, you can open the notebook by executing

jupyter notebook check_env.ipynb

inside this repository. Inside the Notebook, you can run the code cell by clicking on the "Run Cells" button as illustrated in the figure below:

Finally, if your environment satisfies the requirements for the tutorials, the executed code cell will produce an output message as shown below:

Although not required, we also recommend you to update the required Python packages to their latest versions to ensure best compatibility with the teaching material. Please upgrade already installed packages by executing

  • pip install [package-name] --upgrade
  • or conda update [package-name]

Data Downloads

The data for this tutorial is not included in the repository. We will be using several data sets during the tutorial: most are built-in to scikit-learn, which includes code that automatically downloads and caches these data.

Because the wireless network at conferences can often be spotty, it would be a good idea to download these data sets before arriving at the conference. Please run python fetch_data.py to download all necessary data beforehand.

The download size of the data files are approx. 280 MB, and after fetch_data.py extracted the data on your disk, the ./notebook/dataset folder will take 480 MB of your local solid state or hard drive.

Outline

Morning Session

  • 01 Introduction to machine learning with sample applications, Supervised and Unsupervised learning [[view](notebooks/01\ Introduction\ to\ Machine\ Learning.ipynb)]
  • 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib [[view](notebooks/02\ Scientific\ Computing\ Tools\ in\ Python.ipynb)]
  • 03 Data formats, preparation, and representation [[view](notebooks/03\ Data\ Representation\ for\ Machine\ Learning.ipynb)]
  • 04 Supervised learning: Training and test data [[view](notebooks/04\ Training\ and\ Testing\ Data.ipynb)]
  • 05 Supervised learning: Estimators for classification [[view](notebooks/05\ Supervised\ Learning\ -\ Classification.ipynb)]
  • 06 Supervised learning: Estimators for regression analysis [[view](notebooks/06\ Supervised\ Learning\ -\ Regression.ipynb)]
  • 07 Unsupervised learning: Unsupervised Transformers [[view](notebooks/07\ Unsupervised\ Learning\ -\ Transformations\ and\ Dimensionality\ Reduction.ipynb)]
  • 08 Unsupervised learning: Clustering [[view](notebooks/08\ Unsupervised\ Learning\ -\ Clustering.ipynb)]
  • 09 The scikit-learn estimator interface [[view](notebooks/09\ Review\ of\ Scikit-learn\ API.ipynb)]
  • 10 Preparing a real-world dataset (titanic) [[view](notebooks/10\ Case\ Study\ -\ Titanic\ Survival.ipynb)]
  • 11 Working with text data via the bag-of-words model [[view](notebooks/11\ Text\ Feature\ Extraction.ipynb)]
  • 12 Application: IMDb Movie Review Sentiment Analysis [[view](notebooks/12\ Case\ Study\ -\ SMS\ Spam\ Detection.ipynb)]

Afternoon Session

  • 13 Cross-Validation [[view](notebooks/13\ Cross\ Validation.ipynb)]
  • 14 Model complexity and grid search for adjusting hyperparameters [[view](notebooks/14\ Model\ Complexity\ and\ GridSearchCV.ipynb)]
  • 15 Scikit-learn Pipelines [[view](notebooks/15\ Pipelining\ Estimators.ipynb)]
  • 16 Supervised learning: Performance metrics for classification [[view](notebooks/16\ Performance\ metrics\ and\ Model\ Evaluation.ipynb)]
  • 17 Supervised learning: Linear Models [[view](notebooks/17\ In\ Depth\ -\ Linear\ Models.ipynb)]
  • 18 Supervised learning: Support Vector Machines [[view](notebooks/18\ In\ Depth\ -\ Support\ Vector\ Machines.ipynb)]
  • 19 Supervised learning: Decision trees and random forests, and ensemble methods [[view](notebooks/19\ In\ Depth\ -\ Trees\ and\ Forests.ipynb)]
  • 20 Supervised learning: feature selection [[view](notebooks/20\ Feature\ Selection.ipynb)]
  • 21 Unsupervised learning: Hierarchical and density-based clustering algorithms [[view](notebooks/21\ Unsupervised\ learning\ -\ Hierarchical\ and\ density-based\ clustering\ algorithms.ipynb)]
  • 22 Unsupervised learning: Non-linear dimensionality reduction [[view](notebooks/22\ Unsupervised\ learning\ -\ Non-linear\ dimensionality\ reduction.ipynb)]
  • 23 Supervised learning: Out-of-core learning [[view](notebooks/23\ Out-of-core\ Learning\ Large\ Scale\ Text\ Classification.ipynb)]

More Repositories

1

word_cloud

A little word cloud generator in Python
Python
9,936
star
2

introduction_to_ml_with_python

Notebooks and code for the book "Introduction to Machine Learning with Python"
Jupyter Notebook
7,146
star
3

scipy_2015_sklearn_tutorial

Scikit-Learn tutorial material for Scipy 2015
Python
578
star
4

COMS4995-s19

COMS W4995 Applied Machine Learning - Spring 19
Jupyter Notebook
304
star
5

ml-workshop-1-of-4

Introduction to Machine learning with Python, 4h interactive workshop
HTML
296
star
6

scipy-2017-sklearn

Scipy 2017 scikit-learn tutorial by Alex Gramfort and Andreas Mueller
Jupyter Notebook
282
star
7

scipy-2018-sklearn

Scipy 2018 scikit-learn tutorial by Guillaume Lemaitre and Andreas Mueller
Jupyter Notebook
247
star
8

COMS4995-s20

COMS W4995 Applied Machine Learning - Spring 20
Jupyter Notebook
242
star
9

mglearn

mglearn helper package for "Introduction to Machine Learning with Python"
Python
227
star
10

ml-training-intro

Materials for the "Introduction to Machine Learning" class
HTML
225
star
11

ml-training-advanced

Materials for the "Advanced Scikit-learn" class in the afternoon
Jupyter Notebook
161
star
12

COMS4995-s18

COMS W4995 Applied Machine Learning - Spring 18
Jupyter Notebook
158
star
13

ml-workshop-4-of-4

Advanced Machine Learning with Scikit-learn part II
HTML
157
star
14

kaggle_insults

Kaggle Submission for "Detecting Insults in Social Commentary"
Python
153
star
15

ml-workshop-3-of-4

Advanced Machine Learning with Scikit-learn part I
HTML
133
star
16

gco_python

Python wrappers for GCO alpha-expansion and alpha-beta-swaps
Python
131
star
17

ml-workshop-2-of-4

Intermediate Machine Learning with Scikit-learn, 4h interactive workshop
HTML
120
star
18

futurepast

Deprecation tools for Python
Python
119
star
19

advanced_training

Advanced Scikit-learn training session
Jupyter Notebook
119
star
20

applied_ml_spring_2017

Website and material for the FIXME course on Practical Machine Learning
Jupyter Notebook
89
star
21

talks_odt

Slides and materials for most of my talks by year
Jupyter Notebook
89
star
22

odscon-2015

Slides and material for open data science
80
star
23

odscon-sf-2015

Material for ODSCON San Francisco 2015
Jupyter Notebook
79
star
24

quick-ml-intro

One hour interactive training for ML with scikit-learn
Jupyter Notebook
74
star
25

aml

Applied Machine Learning with Python
Jupyter Notebook
73
star
26

pydata-nyc-advanced-sklearn

Notebooks (and slides) for my PyData NYC 2014 tutorial on the more advanced features of scikit-learn.
69
star
27

sklearn_tutorial

Slides for quick intro to machine learning with sklearn
CSS
65
star
28

sklearn-one-day

One day workshop for machine learning with scikit-learn
HTML
62
star
29

segmentation

Superpixel based semantic segmentation
Python
53
star
30

pydata-strata-2015

Slides and notebooks for PyData Strata San Jose
51
star
31

scikit-learn-interactive-tutorial

IPython notebooks and data an interactive scikit-learn tutorial.
51
star
32

patsylearn

Patsy Adaptors for Scikit-learn
Python
49
star
33

advanced_git_nyu_2016

Advanced git and github course material
HTML
39
star
34

textonboost

Texton boost implementation in C++ by Philipp Kraehenbuehl
C++
32
star
35

pydata-amsterdam-2016

Machine Learning with Scikit-Learn (material for pydata Amsterdam 2016)
Jupyter Notebook
31
star
36

ml_meetup_nyc_2016

Material for Machine Learning Meetup "Machine Learning with Scikit-learn"
Jupyter Notebook
29
star
37

odsc_east_2016

Jupyter Notebook
26
star
38

speed_reading

Speed reading app with running focus
CSS
25
star
39

slic-python

SLIC wrapper for Python - legacy, rather use scikit-image now!
C++
23
star
40

ml-workshop-short

Two hour interactive machine learning workshop
HTML
22
star
41

mlss_2015

Material for open source machine learning practical
Python
21
star
42

jupytercon2017

Material for Data analysis and machine learning in Jupyter
Jupyter Notebook
21
star
43

structured-prediction-workshop

Introduction to structured prediction with Python and pystruct
TeX
18
star
44

information-theoretic-mst

Information Theoretic Clustering using Minimum Spanning Trees
Python
18
star
45

advanced-sklearn-boston-nlp-2016

Material and slides for Boston NLP meetup May 23rd 2016
Jupyter Notebook
17
star
46

nyu_ml_lectures

Materials for NYU Machine Learning Guest Lectures
Python
17
star
47

amueller.github.io

Less
17
star
48

ImageNet-parsing-Python

Python class to explore the ImageNet database
Python
16
star
49

water_hackweek_2020_machine_learning

Water Hackweek Machine Learning workshop
Jupyter Notebook
15
star
50

strata-nyc-2016

Materials fort Strata NYC 2016 scikit-learn tutorial
Jupyter Notebook
15
star
51

damascene-python-and-matlab-bindings

Python and matlab bindings for the Damascene CUDA implementation of gPB
C++
13
star
52

git_workshop

Material for git workshop
HTML
11
star
53

strata_singapore_2015

Materials for Strata Singapore "Machine learning In Python with scikit-learn" tutorial.
Jupyter Notebook
9
star
54

sklearn_workshop

Jupyter notebooks for interactive scikit-learn workshop
Python
8
star
55

datasets

Datasets of some standard computer vision / deep learning benchmarks
Python
7
star
56

cv

Curriculum Vitae
TeX
7
star
57

GPU-Quickshift-Python-Bindings

Python bindings for Brian Fultersons really quick shift
C++
7
star
58

structured_prediction_talk

Slides for explaining structured prediction and PyStruct
TeX
6
star
59

oss-directions-webinar-2019

Open Source Directions webinar materials
Jupyter Notebook
6
star
60

intro_to_ml_cuny_2015

Introduction to machine learning for CUNY
5
star
61

columbia-website

My official columbia page
CSS
5
star
62

phd-thesis-segmentation

unearthing my thesis - this is a backup
TeX
4
star
63

figures

Some figures and drawings for talks
3
star
64

daimrf

Python interface for inference with LibDAI
Python
2
star
65

notebooks

Random notebooks
2
star
66

vim-config

Vim Script
2
star
67

nsf-biosketch

stand-alone nsf biosketch
TeX
2
star
68

oss_workshop

Demo repository for oss workshop
1
star
69

dask-learn

Python
1
star
70

gah

Code I don't want to keep reimplementing all the time
1
star
71

CZI-sklearn

TeX
1
star
72

dotfiles

Another try to manage my dotfiles
Shell
1
star
73

icra_2014_crf_nyu

ICRA 2014 paper on crfs for semantic segmenation on the nyu dataset
TeX
1
star