• Stars
    star
    119
  • Rank 297,930 (Top 6 %)
  • Language
    Jupyter Notebook
  • Created over 1 year ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The code from the Machine Learning Bookcamp book

Machine Learning Bookcamp

The code from the Machine Learning Bookcamp book

Useful links:

Machine Learning Zoomcamp

Machine Learning Zoomcamp is a course based on the book

  • It's online and free
  • You can join at any moment
  • More information in the course repository

Reading Plan

Chapters

Chapter 1: Introduction to Machine Learning

  • Understanding machine learning and the problems it can solve
  • CRISP-DM: Organizing a successful machine learning project
  • Training and selecting machine learning models
  • Performing model validation

No code

Chapter 2: Machine Learning for Regression

  • Creating a car-price prediction project with a linear regression model
  • Doing an initial exploratory data analysis with Jupyter notebooks
  • Setting up a validation framework
  • Implementing the linear regression model from scratch
  • Performing simple feature engineering for the model
  • Keeping the model under control with regularization
  • Using the model to predict car prices

Code: chapter-02-car-price/02-carprice.ipynb

Chapter 3: Machine Learning for Classification

  • Predicting customers who will churn with logistic regression
  • Doing exploratory data analysis for identifying important features
  • Encoding categorical variables to use them in machine learning models
  • Using logistic regression for classification

Code: chapter-03-churn-prediction/03-churn.ipynb

Chapter 4: Evaluation Metrics for Classification

  • Accuracy as a way of evaluating binary classification models and its limitations
  • Determining where our model makes mistakes using a confusion table
  • Deriving other metrics like precision and recall from the confusion table
  • Using ROC and AUC to further understand the performance of a binary classification model
  • Cross-validating a model to make sure it behaves optimally
  • Tuning the parameters of a model to achieve the best predictive performance

Code: chapter-03-churn-prediction/04-metrics.ipynb

Chapter 5: Deploying Machine Learning Models

  • Saving models with Pickle
  • Serving models with Flask
  • Managing dependencies with Pipenv
  • Making the service self-contained with Docker
  • Deploying it to the cloud using AWS Elastic Beanstalk

Code: chapter-05-deployment

Chapter 6: Decision Trees and Ensemble Learning

  • Predicting the risk of default with tree-based models
  • Decision trees and the decision tree learning algorithm
  • Random forest: putting multiple trees together into one model
  • Gradient boosting as an alternative way of combining decision trees

Code: chapter-06-trees/06-trees.ipynb

Chapter 7: Neural Networks and Deep Learning

  • Convolutional neural networks for image classification
  • TensorFlow and Keras β€” frameworks for building neural networks
  • Using pre-trained neural networks
  • Internals of a convolutional neural network
  • Training a model with transfer learning
  • Data augmentations β€” the process of generating more training data

Code: chapter-07-neural-nets/07-neural-nets-train.ipynb

Chapter 8: Serverless Deep Learning

  • Serving models with TensorFlow-Lite β€” a light-weight environment for applying TensorFlow models
  • Deploying deep learning models with AWS Lambda
  • Exposing the Lambda function as a web service via API Gateway

Code: chapter-08-serverless

Chapter 9: Kubernetes and Kubeflow

Kubernetes:

  • Understanding different methods of deploying and serving models in the cloud.
  • Serving Keras and TensorFlow models with TensorFlow-Serving
  • Deploying TensorFlow-Serving to Kubernetes

Code: chapter-09-kubernetes

Kubeflow:

  • Using Kubeflow and KFServing for simplifying the deployment process

Code: chapter-09-kubeflow

Articles from mlbookcamp.com:

Appendices

Appendix A: Setting up the Environment

  • Installing Anaconda, a Python distribution that includes most of the scientific libraries we need
  • Running a Jupyter Notebook service from a remote machine
  • Installing and configuring the Kaggle command line interface tool for accessing datasets from Kaggle
  • Creating an EC2 machine on AWS using the web interface and the command-line interface

Code: no code

Articles from mlbookcamp.com:

Appendix B: Introduction to Python

  • Basic python syntax: variables and control-flow structures
  • Collections: lists, tuples, sets, and dictionaries
  • List comprehensions: a concise way of operating on collections
  • Reusability: functions, classes and importing code
  • Package management: using pip for installing libraries
  • Running python scripts

Code: appendix-b-python.ipynb

Articles from mlbookcamp.com:

Appendix C: Introduction to NumPy and Linear Algebra

  • One-dimensional and two-dimensional NumPy arrays
  • Generating NumPy arrays randomly
  • Operations with NumPy arrays: element-wise operations, summarizing operations, sorting and filtering
  • Multiplication in linear algebra: vector-vector, matrix-vector and matrix-matrix multiplications
  • Finding the inverse of a matrix and solving the normal equation

Code: appendix-c-numpy.ipynb

Articles from mlbookcamp.com:

Appendix C: Introduction to Pandas

  • The main data structures in Pandas: DataFrame and Series
  • Accessing rows and columns of a DataFrame
  • Element-wise and summarizing operations
  • Working with missing values
  • Sorting and grouping

Code: appendix-d-pandas.ipynb

Appendix D: AWS SageMaker

  • Increasing the GPU quota limits
  • Renting a Jupyter notebook with GPU in AWS SageMaker

More Repositories

1

data-science-interviews

Data science interview questions and answers
HTML
7,905
star
2

outbrain-click-prediction-kaggle

Solution to the Outbrain Click Prediction competition
Python
144
star
3

datascience-recruitment-challenges

Home assignments for data science positions
Jupyter Notebook
143
star
4

libftrl-python

FTRL-Proximal with python bindings
C++
121
star
5

clothing-dataset

Closing dataset, all classes
84
star
6

avito-duplicates-kaggle

Solution for Avito Duplicate Ads Detection competition
Python
60
star
7

unpossibly-instagram-challenge

Predicting the number of likes an instagram post will receive in 24 hours - winning solution
Python
56
star
8

nips-ad-placement-challenge

The winning solution to the Ad Placement Challenge (NIPS'17 Causal Inference and Machine Learning Workshop)
TeX
37
star
9

clothing-dataset-small

Clothing dataset, 10 classes
36
star
10

lightweight-mlops-zoomcamp

A stripped-down version MLOps Zoomcamp (1.5 hours workshop)
Jupyter Notebook
34
star
11

alexeygrigorev

27
star
12

leetcode-solutions

My solutions to some of leetcode challenges
C++
27
star
13

cikm-cup-2016-cross-device

Solution for the Cross-Device linking challenge from CIKM CUP 2016
Python
23
star
14

aws-lambda-docker

Using AWS Lambda with Docker to deploy a deep learning model
Jupyter Notebook
22
star
15

classifying-crisis-reports-dsc

The top 10 solution to the "Growing Instability: Classifying Crisis Reports" challenge
Python
20
star
16

serverless-deep-learning

Example from my "Serverless Deep Learning" talk
Jupyter Notebook
20
star
17

tensorflow-protobuf

Protobuf files from TensorFlow without TensorFlow
Python
18
star
18

mastering-java-data-science

The code for the book "Mastering Java for Data Science"
Java
18
star
19

wsdmcup17-vandalism-detection

The 2nd place solution for WSDM Cup 2017: Vandalism Detection
Python
17
star
20

kubeflow-deep-learning

Deploying a Keras model with KServe (formerly KFServing) and EKS
Python
15
star
21

mlbookcamp-page

The webpage for ML Bookcamp
HTML
13
star
22

keras-image-helper

A lightweight library for pre-processing images for pre-trained keras models
Jupyter Notebook
12
star
23

tflite-aws-lambda

Compiled TF-Lite for AWS Lambda
Shell
11
star
24

deep-learning-keras-aws-lambda

Jupyter Notebook
11
star
25

hands-on-mlops-workshop

MLOps Hands-on Guide: From Training to Deployment and Monitoring: A day long workshop about MLOps
Jupyter Notebook
11
star
26

warc-extractor

Extract text information from warc files
Java
10
star
27

student-acceptance-project

Building a model for predicting whether a student will be admitted to college. Done as a part of Project of the Week at DataTalks.Club
Jupyter Notebook
10
star
28

java-vk-oauth20

OAuth 2.0 interface for interacting with VK
Java
9
star
29

clickbait-challenge

The 3rd place solution to the clickbait challenge
Python
8
star
30

datasets

different datasets for private and public use
8
star
31

product-recommendation

Experimenting with recommender system techniques as a part of the project-of-the-week at DataTalks.Club
Jupyter Notebook
7
star
32

projects

Various projects
TeX
6
star
33

kubernetes-deep-learning

Deployng a Keras model with TF-Serving and EKS
Python
6
star
34

toloka-data-collection

Jupyter Notebook
6
star
35

ml-observability-workshop

Creating an end-to-end observability platform
Jupyter Notebook
5
star
36

codeforces-crawler

Crawls submissions from codeforces
Java
5
star
37

e2e-ml-workshop

The code for the introduction to end-to-end machine learning workshop (based on Machine Learning Bookcamp)
Jupyter Notebook
5
star
38

dino-or-dragon

Using Stable Diffusion to generate images of dinosaurs and dragons
Jupyter Notebook
5
star
39

itshared-howto

Projects for itshared.org
Java
4
star
40

au-tomator-telegram-bot

Forwarding messages from Telegram to Slack and other things
Python
4
star
41

ml-projects

Jupyter Notebook
4
star
42

competitions

Finished but not cleaned stuff - for my future reference
Jupyter Notebook
4
star
43

aws-lambda-model-deployment-workshop

The materials for the model deployment workshop
Jupyter Notebook
4
star
44

large-datasets

A bunch of large datasets that I don't want to put in the other dataset repo
3
star
45

kaggle

Scripts from Kaggle competitions
Jupyter Notebook
3
star
46

avito-page-view-prediction-boosters

Solution for Avito Page View prediction competition (Avito BI contest task 3 on boosters)
Python
3
star
47

wiktionary-parser

Parsing dumps of wiktionary
Java
2
star
48

alexeygrigorev.github.io

My website
CSS
2
star
49

maven-repo

Artifacts not available on Maven Central
2
star
50

yt8m-kaggle

The solution to the YouTube-8M Video Understanding Challenge
Python
2
star
51

twitter-raffles

Doing give-aways on Twitter with Github actions
Python
2
star
52

aws-emr-spark-model-deployment-workshop

The code for the Spark model deployment workshop
Jupyter Notebook
2
star
53

slack-coffee-matcher

Randomly pairing people in a Slack group. Like Donut, but free and serverless
Python
2
star
54

fast-api-student-acceptance

A project for project-of-the-week from DataTalks.Club for deploying the student acceptance model with FastAPI
Python
2
star
55

datatasks

Data Tasks for Data Talks
1
star
56

airtable-mailchimp-poller

Exporting data from airtable to mailchimp
Jupyter Notebook
1
star
57

aws-ses-util

A handy util for sending emails with AWS SES
Python
1
star
58

frontend-projects

HTML
1
star
59

barololometer

Search engine results tracker and comparer
Java
1
star
60

ololearning-vector

Ololo Learning C
C++
1
star
61

zapier-telegram-bot

Telegram bot that sends messages to Zapier and other webhooks
Python
1
star
62

transcript-utils

Transcript yamlifier and timecode extractor
Python
1
star
63

namespacediscovery-pipeline

Mathematical namespace discovery
Python
1
star
64

rseq

Sequence pattern matching library
Java
1
star
65

TyrianMediawiki-Skin

Bootstrap-based skin for MediaWiki
CSS
1
star
66

notebooks

IPython notebooks
Jupyter Notebook
1
star
67

codeforces-solutions-java

Codeforces solutions and some algorithms
Java
1
star
68

product-review-clustering-project

Experimenting with different clustering techniques. Done as a part of Project of the Week at DataTalks.Club
1
star
69

ds-toolbox

Data Science toolbox for Java
Java
1
star
70

au-tomator-lambda

The Au-Tomator Slack bot as an AWS Lambda function
Jupyter Notebook
1
star
71

JLP

Java Language Processing: retrive identifiers from java
Java
1
star
72

kfserving-keras-transformer

A transformer for KFServing that users keras_image_helper
Python
1
star
73

search-engine-workshop

1
star
74

rest-crawler

A REST API for crawling web pager
Java
1
star
75

llm-zoomcamp

Repo and codespace for LLM Zoomcamp
1
star