• Stars
    star
    246
  • Rank 164,726 (Top 4 %)
  • Language
    Jupyter Notebook
  • License
    Creative Commons ...
  • Created over 6 years ago
  • Updated about 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Scipy 2018 scikit-learn tutorial by Guillaume Lemaitre and Andreas Mueller

SciPy 2018 Scikit-learn Tutorial

Instructors


This repository will contain the teaching material and other info associated with our scikit-learn tutorial at SciPy 2018 held July 9-15 in Austin, Texas.

Parts 1 to 12 make up the morning session, while parts 13 to 23 will be presented in the afternoon (approximately)

Schedule:

The 2-part tutorial will be held on Tuesday, July 10, 2018.

Obtaining the Tutorial Material

If you have a GitHub account, it is probably most convenient if you clone or fork the GitHub repository. You can clone the repository by running:

git clone https://github.com/amueller/scipy-2018-sklearn.git

If you are not familiar with git or don’t have an GitHub account, you can download the repository as a .zip file by heading over to the GitHub repository (https://github.com/amueller/scipy-2018-sklearn) in your browser and click the green “Download” button in the upper right.

Please note that we may add and improve the material until shortly before the tutorial session, and we recommend you to update your copy of the materials one day before the tutorials. If you have an GitHub account and cloned the repository via GitHub, you can sync your existing local repository with:

git pull origin master

If you don’t have a GitHub account, you may have to re-download the .zip archive from GitHub.

Installation Notes

This tutorial will require recent installations of

The last one is important and you should be able to type:

jupyter notebook

in your terminal window and see the notebook panel load in your web browser. Try opening and running a notebook from the material to see check that it works. Alternatively you can use Jupyter lab.

For users who do not yet have the required packages installed, a relatively painless way to install all the requirements is to use a Python distribution such as Anaconda, which includes the most relevant Python packages for science, math, engineering, and data analysis; Anaconda can be downloaded and installed for free including commercial use and redistribution. The code examples in this tutorial should be compatible to Python 2.7, Python 3.4-3.6.

After obtaining the material, we strongly recommend you to open and execute the Jupyter Notebook jupter notebook check_env.ipynb that is located at the top level of this repository. Inside the repository, you can open the notebook by executing

jupyter notebook check_env.ipynb

inside this repository. Inside the Notebook, you can run the code cell by clicking on the "Run Cells" button as illustrated in the figure below:

Finally, if your environment satisfies the requirements for the tutorials, the executed code cell will produce an output message as shown below:

Although not required, we also recommend you to update the scikit-learn the latest release version to ensure best compatibility with the teaching material. Please upgrade already installed packages by executing

  • pip install --no-deps --upgrade [package-name]
  • or conda update [package-name]

Depending on how you installed scikit-learn.

Data Downloads

The data for this tutorial is not included in the repository. We will be using several data sets during the tutorial: most are built-in to scikit-learn, which includes code that automatically downloads and caches these data.

Because the wireless network at conferences can often be spotty, it would be a good idea to download these data sets before arriving at the conference. Please run

python fetch_data.py

to download all necessary data beforehand.

The download size of the data files are approx. 280 MB, and after fetch_data.py extracted the data on your disk, the ./notebook/dataset folder will take 480 MB of your local hard drive.

Outline

Morning Session

  • 01 Introduction to machine learning with sample applications, Supervised and Unsupervised learning [view]
  • 02 Scientific Computing Tools for Python: NumPy, SciPy, and matplotlib [view]
  • 03 Data formats, preparation, and representation [view]
  • 04 Supervised learning: Training and test data [view]
  • 05 Supervised learning: Estimators for classification [view]
  • 06 Supervised learning: Estimators for regression analysis [view]
  • 07 Unsupervised learning: Unsupervised Transformers [view]
  • 08 Unsupervised learning: Clustering [view]
  • 09 The scikit-learn estimator interface [view]
  • 10 Preparing a real-world dataset (titanic) [view]
  • 11 Working with text data via the bag-of-words model [view]
  • 12 Application: IMDb Movie Review Sentiment Analysis [view]

Afternoon Session

  • 13 Cross-Validation [view]
  • 14 Model complexity and grid search for adjusting hyperparameters [view]
  • 15 Scikit-learn Pipelines [view]
  • 16 Supervised learning: Performance metrics for classification [view]
  • 17 Supervised learning: Linear Models [view]
  • 18 Supervised learning: Decision trees and random forests, and ensemble methods [view]
  • 19 Supervised learning: feature selection [view]
  • 20 Unsupervised learning: Hierarchical and density-based clustering algorithms [view]
  • 21 Unsupervised learning: Non-linear dimensionality reduction [view]
  • 22 Unsupervised learning: Anomaly Detection [view]
  • 23 Supervised learning: Out-of-core learning [view]

More Repositories

1

word_cloud

A little word cloud generator in Python
Python
10,101
star
2

introduction_to_ml_with_python

Notebooks and code for the book "Introduction to Machine Learning with Python"
Jupyter Notebook
7,348
star
3

scipy_2015_sklearn_tutorial

Scikit-Learn tutorial material for Scipy 2015
Python
578
star
4

scipy-2016-sklearn

Scikit-learn tutorial at SciPy2016
Jupyter Notebook
515
star
5

ml-workshop-1-of-4

Introduction to Machine learning with Python, 4h interactive workshop
HTML
303
star
6

COMS4995-s19

COMS W4995 Applied Machine Learning - Spring 19
Jupyter Notebook
303
star
7

scipy-2017-sklearn

Scipy 2017 scikit-learn tutorial by Alex Gramfort and Andreas Mueller
Jupyter Notebook
283
star
8

COMS4995-s20

COMS W4995 Applied Machine Learning - Spring 20
Jupyter Notebook
245
star
9

mglearn

mglearn helper package for "Introduction to Machine Learning with Python"
Python
229
star
10

ml-training-intro

Materials for the "Introduction to Machine Learning" class
HTML
227
star
11

ml-training-advanced

Materials for the "Advanced Scikit-learn" class in the afternoon
Jupyter Notebook
163
star
12

ml-workshop-4-of-4

Advanced Machine Learning with Scikit-learn part II
HTML
162
star
13

COMS4995-s18

COMS W4995 Applied Machine Learning - Spring 18
Jupyter Notebook
158
star
14

kaggle_insults

Kaggle Submission for "Detecting Insults in Social Commentary"
Python
153
star
15

ml-workshop-3-of-4

Advanced Machine Learning with Scikit-learn part I
HTML
139
star
16

gco_python

Python wrappers for GCO alpha-expansion and alpha-beta-swaps
Python
131
star
17

ml-workshop-2-of-4

Intermediate Machine Learning with Scikit-learn, 4h interactive workshop
HTML
125
star
18

advanced_training

Advanced Scikit-learn training session
Jupyter Notebook
120
star
19

futurepast

Deprecation tools for Python
Python
118
star
20

talks_odt

Slides and materials for most of my talks by year
Jupyter Notebook
89
star
21

applied_ml_spring_2017

Website and material for the FIXME course on Practical Machine Learning
Jupyter Notebook
88
star
22

odscon-2015

Slides and material for open data science
80
star
23

odscon-sf-2015

Material for ODSCON San Francisco 2015
Jupyter Notebook
79
star
24

aml

Applied Machine Learning with Python
Jupyter Notebook
76
star
25

quick-ml-intro

One hour interactive training for ML with scikit-learn
Jupyter Notebook
74
star
26

pydata-nyc-advanced-sklearn

Notebooks (and slides) for my PyData NYC 2014 tutorial on the more advanced features of scikit-learn.
69
star
27

sklearn_tutorial

Slides for quick intro to machine learning with sklearn
CSS
65
star
28

sklearn-one-day

One day workshop for machine learning with scikit-learn
HTML
63
star
29

segmentation

Superpixel based semantic segmentation
Python
53
star
30

scikit-learn-interactive-tutorial

IPython notebooks and data an interactive scikit-learn tutorial.
51
star
31

pydata-strata-2015

Slides and notebooks for PyData Strata San Jose
51
star
32

patsylearn

Patsy Adaptors for Scikit-learn
Python
49
star
33

advanced_git_nyu_2016

Advanced git and github course material
HTML
39
star
34

textonboost

Texton boost implementation in C++ by Philipp Kraehenbuehl
C++
32
star
35

pydata-amsterdam-2016

Machine Learning with Scikit-Learn (material for pydata Amsterdam 2016)
Jupyter Notebook
30
star
36

ml_meetup_nyc_2016

Material for Machine Learning Meetup "Machine Learning with Scikit-learn"
Jupyter Notebook
29
star
37

odsc_east_2016

Jupyter Notebook
26
star
38

speed_reading

Speed reading app with running focus
CSS
25
star
39

slic-python

SLIC wrapper for Python - legacy, rather use scikit-image now!
C++
23
star
40

ml-workshop-short

Two hour interactive machine learning workshop
HTML
22
star
41

mlss_2015

Material for open source machine learning practical
Python
21
star
42

jupytercon2017

Material for Data analysis and machine learning in Jupyter
Jupyter Notebook
21
star
43

structured-prediction-workshop

Introduction to structured prediction with Python and pystruct
TeX
18
star
44

information-theoretic-mst

Information Theoretic Clustering using Minimum Spanning Trees
Python
18
star
45

advanced-sklearn-boston-nlp-2016

Material and slides for Boston NLP meetup May 23rd 2016
Jupyter Notebook
17
star
46

nyu_ml_lectures

Materials for NYU Machine Learning Guest Lectures
Python
17
star
47

amueller.github.io

Less
17
star
48

ImageNet-parsing-Python

Python class to explore the ImageNet database
Python
16
star
49

water_hackweek_2020_machine_learning

Water Hackweek Machine Learning workshop
Jupyter Notebook
15
star
50

strata-nyc-2016

Materials fort Strata NYC 2016 scikit-learn tutorial
Jupyter Notebook
15
star
51

damascene-python-and-matlab-bindings

Python and matlab bindings for the Damascene CUDA implementation of gPB
C++
13
star
52

git_workshop

Material for git workshop
HTML
11
star
53

strata_singapore_2015

Materials for Strata Singapore "Machine learning In Python with scikit-learn" tutorial.
Jupyter Notebook
9
star
54

sklearn_workshop

Jupyter notebooks for interactive scikit-learn workshop
Python
8
star
55

cv

Curriculum Vitae
TeX
7
star
56

datasets

Datasets of some standard computer vision / deep learning benchmarks
Python
7
star
57

GPU-Quickshift-Python-Bindings

Python bindings for Brian Fultersons really quick shift
C++
7
star
58

structured_prediction_talk

Slides for explaining structured prediction and PyStruct
TeX
6
star
59

oss-directions-webinar-2019

Open Source Directions webinar materials
Jupyter Notebook
6
star
60

intro_to_ml_cuny_2015

Introduction to machine learning for CUNY
5
star
61

columbia-website

My official columbia page
CSS
5
star
62

phd-thesis-segmentation

unearthing my thesis - this is a backup
TeX
4
star
63

figures

Some figures and drawings for talks
3
star
64

daimrf

Python interface for inference with LibDAI
Python
2
star
65

notebooks

Random notebooks
2
star
66

vim-config

Vim Script
2
star
67

nsf-biosketch

stand-alone nsf biosketch
TeX
2
star
68

oss_workshop

Demo repository for oss workshop
1
star
69

dask-learn

Python
1
star
70

CZI-sklearn

TeX
1
star
71

gah

Code I don't want to keep reimplementing all the time
1
star
72

dotfiles

Another try to manage my dotfiles
Shell
1
star
73

icra_2014_crf_nyu

ICRA 2014 paper on crfs for semantic segmenation on the nyu dataset
TeX
1
star