• Stars
    star
    4,057
  • Rank 10,580 (Top 0.3 %)
  • Language
    Jupyter Notebook
  • License
    GNU General Publi...
  • Created over 10 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A collection of tutorials and examples for solving and understanding machine learning and pattern classification tasks

logo


**Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining.**



Sections



[Download a PDF version] of this flowchart.






Introduction to Machine Learning and Pattern Classification

[back to top]

  • Predictive modeling, supervised machine learning, and pattern classification - the big picture [Markdown]

  • Entry Point: Data - Using Python's sci-packages to prepare data for Machine Learning tasks and other data analyses [IPython nb]

  • An Introduction to simple linear supervised classification using scikit-learn [IPython nb]






Pre-processing

[back to top]

  • Feature Extraction

    • Tips and Tricks for Encoding Categorical Features in Classification Tasks [IPython nb]
  • Scaling and Normalization

    • About Feature Scaling: Standardization and Min-Max-Scaling (Normalization) [IPython nb]
  • Feature Selection

    • Sequential Feature Selection Algorithms [IPython nb]
  • Dimensionality Reduction

    • Principal Component Analysis (PCA) [IPython nb]
    • The effect of scaling and mean centering of variables prior to a PCA [PDF] [HTML]
    • PCA based on the covariance vs. correlation matrix [IPython nb]
    • Linear Discriminant Analysis (LDA) [IPython nb]
      • Kernel tricks and nonlinear dimensionality reduction via PCA [IPython nb]
  • Representing Text

    • Tf-idf Walkthrough for scikit-learn [IPython nb]



Model Evaluation

[back to top]

  • An Overview of General Performance Metrics of Binary Classifier Systems [PDF]
  • Cross-validation
    • Streamline your cross-validation workflow - scikit-learn's Pipeline in action [IPython nb]
  • Model evaluation, model selection, and algorithm selection in machine learning - Part I [Markdown]
  • Model evaluation, model selection, and algorithm selection in machine learning - Part II [Markdown]



Parameter Estimation

[back to top]

  • Parametric Techniques

    • Introduction to the Maximum Likelihood Estimate (MLE) [IPython nb]
    • How to calculate Maximum Likelihood Estimates (MLE) for different distributions [IPython nb]
  • Non-Parametric Techniques

    • Kernel density estimation via the Parzen-window technique [IPython nb]
    • The K-Nearest Neighbor (KNN) technique
  • Regression Analysis

    • Linear Regression

    • Non-Linear Regression




Machine Learning Algorithms

[back to top]

Bayes Classification

  • Naive Bayes and Text Classification I - Introduction and Theory [PDF]

Logistic Regression

  • Out-of-core Learning and Model Persistence using scikit-learn [IPython nb]

Neural Networks

  • Artificial Neurons and Single-Layer Neural Networks - How Machine Learning Algorithms Work Part 1 [IPython nb]

  • Activation Function Cheatsheet [IPython nb]

Ensemble Methods

  • Implementing a Weighted Majority Rule Ensemble Classifier in scikit-learn [IPython nb]

Decision Trees

  • Cheatsheet for Decision Tree Classification [IPython nb]



Clustering

[back to top]

  • Protoype-based clustering
  • Hierarchical clustering
    • Complete-Linkage Clustering and Heatmaps in Python [IPython nb]
  • Density-based clustering
  • Graph-based clustering
  • Probabilistic-based clustering



Collecting Data

[back to top]

  • Collecting Fantasy Soccer Data with Python and Beautiful Soup [IPython nb]

  • Download Your Twitter Timeline and Turn into a Word Cloud Using Python [IPython nb]

  • Reading MNIST into NumPy arrays [IPython nb]




Data Visualization

[back to top]

  • Exploratory Analysis of the Star Wars API [IPython nb]

  • Matplotlib examples -Exploratory data analysis of the Iris dataset [IPython nb]

  • Artificial Intelligence publications per country

[IPython nb] [PDF]




Statistical Pattern Classification Examples

[back to top]

  • Supervised Learning

    • Parametric Techniques

      • Univariate Normal Density

        • Ex1: 2-classes, equal variances, equal priors [IPython nb]
        • Ex2: 2-classes, different variances, equal priors [IPython nb]
        • Ex3: 2-classes, equal variances, different priors [IPython nb]
        • Ex4: 2-classes, different variances, different priors, loss function [IPython nb]
        • Ex5: 2-classes, different variances, equal priors, loss function, cauchy distr. [IPython nb]
      • Multivariate Normal Density

        • Ex5: 2-classes, different variances, equal priors, loss function [IPython nb]
        • Ex7: 2-classes, equal variances, equal priors [IPython nb]
    • Non-Parametric Techniques




Books

[back to top]

Python Machine Learning




Talks

[back to top]

An Introduction to Supervised Machine Learning and Pattern Classification: The Big Picture

[View on SlideShare]

[Download PDF]



MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song Lyrics

[View on SlideShare]

[Download PDF]




Applications

[back to top]

MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song Lyrics

This project is about building a music recommendation system for users who want to listen to happy songs. Such a system can not only be used to brighten up one's mood on a rainy weekend; especially in hospitals, other medical clinics, or public locations such as restaurants, the MusicMood classifier could be used to spread positive mood among people.

[musicmood GitHub Repository]


mlxtend - A library of extension and helper modules for Python's data analysis and machine learning libraries.

[mlxtend GitHub Repository]




Resources

[back to top]

  • Copy-and-paste ready LaTex equations [Markdown]

  • Open-source datasets [Markdown]

  • Free Machine Learning eBooks [Markdown]

  • Terms in data science defined in less than 50 words [Markdown]

  • Useful libraries for data science in Python [Markdown]

  • General Tips and Advices [Markdown]

  • A matrix cheatsheat for Python, R, Julia, and MATLAB [HTML]

More Repositories

1

deeplearning-models

A collection of various deep learning architectures, models, and tips
Jupyter Notebook
16,088
star
2

python-machine-learning-book

The "Python Machine Learning (1st edition)" book code repository and info resource
Jupyter Notebook
12,030
star
3

python-machine-learning-book-2nd-edition

The "Python Machine Learning (2nd edition)" book code repository and info resource
Jupyter Notebook
7,021
star
4

mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.
Python
4,631
star
5

python-machine-learning-book-3rd-edition

The "Python Machine Learning (3rd edition)" book code repository
Jupyter Notebook
4,483
star
6

python_reference

Useful functions, tutorials, and other Python-related things
Jupyter Notebook
3,715
star
7

deep-learning-book

Repository for "Introduction to Artificial Neural Networks and Deep Learning: A Practical Guide with Applications in Python"
Jupyter Notebook
2,766
star
8

machine-learning-book

Code Repository for Machine Learning with PyTorch and Scikit-Learn
Jupyter Notebook
2,428
star
9

matplotlib-gallery

Examples of matplotlib codes and plots
Jupyter Notebook
1,135
star
10

watermark

An IPython magic extension for printing date and time stamps, version numbers, and hardware information
Python
845
star
11

machine-learning-notes

Collection of useful machine learning codes and snippets (originally intended for my personal use)
Jupyter Notebook
709
star
12

stat479-machine-learning-fs19

Course material for STAT 479: Machine Learning (FS 2019) taught by Sebastian Raschka at University Wisconsin-Madison
Jupyter Notebook
673
star
13

scipy2023-deeplearning

Jupyter Notebook
597
star
14

pyprind

PyPrind - Python Progress Indicator Utility
Python
545
star
15

stat453-deep-learning-ss20

STAT 453: Intro to Deep Learning @ UW-Madison (Spring 2020)
Jupyter Notebook
537
star
16

stat479-deep-learning-ss19

Course material for STAT 479: Deep Learning (SS 2019) at University Wisconsin-Madison
Jupyter Notebook
493
star
17

algorithms_in_ipython_notebooks

A repository with IPython notebooks of algorithms implemented in Python.
Jupyter Notebook
493
star
18

stat479-machine-learning-fs18

Course material for STAT 479: Machine Learning (FS 2018) at University Wisconsin-Madison
Jupyter Notebook
470
star
19

musicmood

A machine learning approach to classify songs by mood.
OpenEdge ABL
404
star
20

stat453-deep-learning-ss21

STAT 453: Intro to Deep Learning @ UW-Madison (Spring 2021)
Jupyter Notebook
363
star
21

stat451-machine-learning-fs20

STAT 451: Intro to Machine Learning @ UW-Madison (Fall 2020)
Jupyter Notebook
359
star
22

datacollect

A collection of tools to collect and download various data.
Jupyter Notebook
207
star
23

data-science-tutorial

Code material for a data science tutorial
Jupyter Notebook
188
star
24

LLM-finetuning-scripts

Jupyter Notebook
135
star
25

One-Python-benchmark-per-day

An ongoing fun challenge where I'll try to post one Python benchmark per day.
HTML
130
star
26

pydata-chicago2016-ml-tutorial

Machine learning with scikit-learn tutorial at PyData Chicago 2016
Jupyter Notebook
128
star
27

stat451-machine-learning-fs21

Jupyter Notebook
128
star
28

cvpr2023

Python
116
star
29

faster-pytorch-blog

Outlining techniques for improving the training performance of your PyTorch model without compromising its accuracy
Python
115
star
30

msu-datascience-ml-tutorial-2018

Machine learning with Python tutorial at MSU Data Science 2018
Jupyter Notebook
105
star
31

protein-science

A collection of useful tutorials for Protein Science
Python
96
star
32

markdown-toclify

A Python command line tool that creates a Table of Contents for Markdown documents
Python
92
star
33

cyclemoid-pytorch

Cyclemoid implementation for PyTorch
Python
85
star
34

MachineLearning-QandAI-book

Machine Learning Q and AI book
Jupyter Notebook
83
star
35

pydata-annarbor2017-dl-tutorial

Code snippets for "Introduction to Deep Learning with TensorFlow" at PyData Ann Arbor Aug 2017
Jupyter Notebook
80
star
36

smilite

A Python module to retrieve and compare SMILE strings of chemical compounds from the free ZINC online database
Python
73
star
37

pytorch-memory-optim

This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog post.
Python
68
star
38

DeepLearning-Gdansk2019-tutorial

Ordinal Regression tutorial for the International Summer School on Deep Learning 2019
Jupyter Notebook
66
star
39

model-eval-article-supplementary

Supplementary material for the article "Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning"
Jupyter Notebook
62
star
40

nn_plus_gzip

Gzip and nearest neighbors for text classification
Jupyter Notebook
58
star
41

LLMs-from-scratch

Implementing ChatGPT-like LLMs from scratch, step by step
Jupyter Notebook
54
star
42

interpretable-ml-article

Code examples for my Interpretable Machine Learning Blog Series
Jupyter Notebook
54
star
43

R_snippets

R Scripts for general data analysis and plotting
R
45
star
44

numpy-intro-blogarticle-2020

Jupyter Notebook for the "Scientific Computing in Python: Introduction to NumPy and Matplotlib" blog article
Jupyter Notebook
40
star
45

blog-finetuning-llama-adapters

Supplementary material for "Understanding Parameter-Efficient Finetuning of Large Language Models: From Prefix Tuning to Adapters"
Jupyter Notebook
36
star
46

mputil

Utility functions for Python's multiprocessing standard library module
Python
35
star
47

pybibtex

Utility functions for parsing BibTeX files and creating citation reference lists.
Python
32
star
48

mytorch

Collection of PyTorch-related utility functions
Python
27
star
49

posit2023-python-ml

Workshop materials for posit::conf(2023)
Jupyter Notebook
25
star
50

comparing-automatic-augmentation-blog

Comparing four automatic image augmentation techniques in PyTorch: AutoAugment, RandAugment, AugMix, and TrivialAugment
Jupyter Notebook
25
star
51

pytorch-fabric-demo

Python
24
star
52

Hbind

Calculates hydrogen-bond interaction tables for protein-small molecule complexes, based on protein PDB and protonated ligand MOL2 structure input. Raschka et al. (2018) J. Computer-Aided Molec. Design
C
24
star
53

scipy2022-talk

Python
23
star
54

screenlamp

screenlamp is a Python toolkit for hypothesis-driven virtual screening
Python
22
star
55

gradient-accumulation-blog

Finetuning BLOOM on a single GPU using gradient-accumulation
Python
22
star
56

2021-issdl-gdansk

Intro to GAN Tutorial for the International Summer School of Deep Learning 2021 in Gdansk
Jupyter Notebook
21
star
57

predicting-activity-by-machine-learning

Activity From Virtual Screening Code Repository
Jupyter Notebook
20
star
58

BondPack

A collection of PyMOL plugins to visualize atomic bonds.
Python
20
star
59

uw-madison-datacience-club-talk-oct2019

Slides and code for the talk at UW-Madison's Data Science Club, 10 Oct 2019
Jupyter Notebook
20
star
60

siteinterlock

A novel approach to pose selection in protein-ligand docking based on graph theory.
Python
19
star
61

low-rank-adaptation-blog

Python
19
star
62

R-notes

Various R lang related material for teaching.
Python
19
star
63

2021-pydata-jeddah

Materials for "Transformers from the Ground Up" at PyData Jeddah on August 5, 2021
Jupyter Notebook
18
star
64

b3-basic-batchsize-benchmark

Experiments for the blog post "No, We Don't Have to Choose Batch Sizes As Powers Of 2"
Python
16
star
65

ViT-finetuning-scripts

Vision transformer finetuning scripts
Python
15
star
66

datapipes-blog

Code for the DataPipes article
Jupyter Notebook
14
star
67

srgan-lightning-blog

Sharing Deep Learning Research Models with Lightning Part 1: Building A Super Resolution App
Python
14
star
68

try-lion-optimizer

Jupyter Notebook
10
star
69

mnist-pngs

MNIST files in PNG format
Python
10
star
70

py-args

Python command line tools as productivity supplements for Posix systems
Python
10
star
71

ecml-teaching-ml-2021

Jupyter Notebook
10
star
72

HbindViz

Tools for generating hydrogen-bond interaction visualizations from Hbind
Python
9
star
73

protein-recognition-index

Protein Recognition Index (PRI), measuring the similarity between H-bonding features in a given complex (predicted or designed) and the characteristic H-bond trends from crystallographic complexes
Python
8
star
74

ord-torchhub

Ordinal Regression PyTorch Hub
Python
6
star
75

compair

Model evaluation utilities
Python
5
star
76

torchmetrics-blog

Code for "TorchMetrics -- How do we use it, and what's the difference between .update() and .forward()"
Jupyter Notebook
5
star
77

rasbt

5
star
78

advent-of-code-2016

My Solutions for the Advent of Code 2016
Python
5
star
79

bugreport

A repository to store code examples to reproduce issues for bug reports.
Jupyter Notebook
2
star
80

bookgiveaway-2022-wordcloud

Word cloud from the results of the book-giveaway
Jupyter Notebook
1
star
81

pycon2024

Tutorial Materials for "The Fundamentals of Modern Deep Learning with PyTorch"
1
star