• Stars
    star
    4,656
  • Rank 8,737 (Top 0.2 %)
  • Language
    HTML
  • License
    MIT License
  • Created almost 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An in-depth machine learning tutorial introducing readers to a whole machine learning pipeline from scratch.

harvard-logo

An end to end tutorial of a machine learning pipeline

This tutorial tries to do what most Most Machine Learning tutorials available online do not. It is not a 30 minute tutorial which teaches you how to "Train your own neural network" or "Learn deep learning in under 30 minutes". It's a full pipeline which you would need to do if you actually work with machine learning - introducing you to all the parts, and all the implementation decisions and details that need to be made. The dataset is not one of the standard sets like MNIST or CIFAR, you will make you very own dataset. Then you will go through a couple conventional machine learning algorithms, before finally getting to deep learning!

In the fall of 2016, I was a Teaching Fellow (Harvard's version of TA) for the graduate class on "Advanced Topics in Data Science (CS209/109)" at Harvard University. I was in-charge of designing the class project given to the students, and this tutorial has been built on top of the project I designed for the class.

UPDATE 24th October 2018

The tutorial has now been re-written in PyTorch thanks to Anshul Basia (https://github.com/AnshulBasia)

You can access the HTML here: https://spandan-madan.github.io/DeepLearningProject/PyTorch_version/Deep_Learning_Project-Pytorch.html and the IPython Notebook with the code in PyTorch here:https://github.com/Spandan-Madan/DeepLearningProject/blob/master/PyTorch_version/Deep_Learning_Project-Pytorch.ipynb

Citing if you use the work here

If you would like to use this work, please cite the work using the doi - DOI

Reading/Viewing the Tutorial

To view the project as an HTML file, visit - https://spandan-madan.github.io/DeepLearningProject/

The Code

If you would like to access to Code, please go through the ipython notebook Deep_Learning_Project.ipynb

SETUP

Python

  • We will be using Python 2.7. Primary reason is that Tensorflow is not compatible with python > 3.5, and some other libraries are not compatible with python 3.

To make setup easy, we are going to use conda.

  • Please install conda 3 from https://www.continuum.io/downloads
  • The repository has a conda config file which will make setting up super easy. It's the file deeplearningproject_environment.yml
  • Then create a new conda environment using the command with conda env create -f deeplearningproject_environment.yml
  • Now, you can activate the environment with: source activate deeplearningproject
  • jupyter notebook If all the isntallations go through, you are good to go! If not, here is a list of packages that need to be installed: requests imDbPy wget tmdbsimple seaborn sklearn Pillow keras tensorflow h5py gensim nltk stop_words

Please install imdbpy using 'pip install imdbpy==6.6' since earlier versions are broken

Setting up conda environment in jupyter notebook

To be able to run the environment you just created on a juputer notebook, first check that you have the python package ipykernel installed. If you don't simply install it using

pip install ipykernel

Now, add this to your jupyter notebook using the command:

python -m ipykernel install --user --name deeplearningproject --display-name "deeplearningproject"

Needless to say, remove all single quotes before running commands.

Go to the directory and run jupyter notbeook by "jupyter notebook" and open the respective notebook on browser. TO install TMDB: pip install tmdbsimple Use "import tmdbsimple as tmdb"

Setting up a docker container with docker-compose

Prerequisites

Run docker-compose

To work with an isolate environment and be able to run it on many systems without troubles, you can run this docker-compose command:

docker-compose up

It will build deeplearningproject image according to Dockerfile. And then run dokcer container via docker-compose. See Docker and docker-compose docs for more informations :

Then access notebooks through your web browser at http://localhost:8888

You should notice that notebooks have been copied from root to notebooks folder to mount them into container via bind volume. Any changes you make, will be saved on host (notebooks dir).

Add packages

You can add conda or pip packages to image (and thus, container) by updating deeplearningproject_environment.yml file and then run

docker-compose build

It will build a new deeplearningproject image with new conda/pip packages installed. Stop your running container (CTRL-C) and then docker-compose up to rerun a fresh new container.

Known common bugs

I will keep updating this as issues pop up on this repository.

  • One known bug is because Keras 2.0 is not compatible with some Keras 1.2 functionalities. You may run into errors with importing VGG16. If so, just update keras using the following command:
sudo pip install git+git://github.com/fchollet/keras.git --upgrade

-OS Error: Too Many Open Files Refer to: https://stackoverflow.com/questions/16526783/python-subprocess-too-many-open-files or, shut down notebook and execute following the the same terminal ``bash ulimit -Sn 10000


And restart the jupyter notebook.

Hope this repo helps introduce you to a full machine learning pipeline! If you spot an error, please create an issue to help out others using this resource!

To prevent problems with installation and setting up, this repository comes with a conda environment profile. The only thing you will need is to install the newest version of conda, and use this profile to create a new environment and it will come set up with all the libraries you will need for the tutorial.

More Repositories

1

Me_Bot

Build a bot that speaks like you!
Jupyter Notebook
683
star
2

A-Collection-of-important-tasks-in-pytorch

Everyday things people use in Pytorch. No need to spend hours reading Pytorch forums trying to find them!
HTML
279
star
3

Pytorch_fine_tuning_Tutorial

A short tutorial on performing fine tuning or transfer learning in PyTorch.
Python
273
star
4

NLP-Intuition-and-Applications-of-word-embeddings

UPCOMING TUTORIAL on Word Representations and how they are important for NLP applications.
32
star
5

lectures_and_talks

Collection of lectures and talks I've given on Computer Vision and Deep Learning
28
star
6

generalization_to_OOD_category_viewpoint_combinations

Source code for the Nature Machine Intelligence paper: When and how convolutional neural networks generalize to out-of-distribution category-viewpoint combinations.
Python
19
star
7

Spotify

Repo for spotify project for capstone
Jupyter Notebook
8
star
8

End_to_end_image_classification_pipeline

In-depth look at an image classification pipeline for your own dataset. https://goo.gl/BuL267
HTML
7
star
9

in_distribution_adversarial_examples

Source code for the paper: "Adversarial examples within the training distribution: A widespread challenge"
Jupyter Notebook
6
star
10

flask_example

Simple example for passing information around using flask. Shows how to accept input from user and use the input for something else.
HTML
4
star
11

pytorch_data_loader_tutorial

Python
4
star
12

online_tutorials

Collection of good online tutorials I keep running into while learning new things.
4
star
13

python_for_people_with_no_background

Basic Python
3
star
14

Statistical_Intuitions

A blog introducing some basics, and some less talked about ideas in statistics. To request an article on a topic, create an issue on this repo!
HTML
3
star
15

deep_learning_normalization

A personal pet project to explore different kinds of normalizations in Deep Learning research
3
star
16

Harvard_BAI

Repo for Harvard's Biological and Artificial Intelligence course (Neuro 140/240).
Jupyter Notebook
3
star
17

learning_to_speak_python

A course I am putting together to make the quarantine more productive for people from diverse backgrounds. The goal is to teach everyone to communicate with computers using python.
Jupyter Notebook
3
star
18

interpretability_reading_group

Storing meeting papers + notes + discussions
2
star
19

Probability_ML_Proofs

Collection of important proofs in basic probability and machine learning.
2
star
20

lit_review

Cross Referencing on steroids: Scripts to extract papers that you should read if you like a paper.
Python
1
star
21

pirate_bot

Pirate bot
Jupyter Notebook
1
star
22

GANs_vs_Humans

Jupyter Notebook
1
star
23

human_visual_diet

Jupyter Notebook
1
star