Machine Learning with Imbalanced Data - Code Repository
Published November, 2020
Actively maintained.
Links
Table of Contents
-
Metrics
- Limitations of the Accuracy
- Precision, Recall, F-Measure
- Confusion Matrix
- False Positive Rate and False Negative Rate
- Geometric Mean
- Dominance
- Index of imbalanced accuracy
- ROC-AUC
- Precision-Recall Curves
- Probability Distribution and Calibration
- Which metric to optimise
-
Udersampling Methods
- Random Undersampling
- Condensed Nearest Neighbour
- Tomek Links
- One Sided Selection
- Edited Nearest Neighbours
- Repeated Edited Nearest Neighbours
- All KNN
- Neighbourhood Cleaning Rule
- NearMiss
- Instance Hardness Threshold
-
Oversampling methods
- Random Oversampling
- ADASYN
- SMOTE
- BorderlineSMOTE
- KMeansSMOTE
- SMOTENC
- SVMSMOTE
-
Over and Undersampling Methods
- SMOTENN
- SMOTETomek
-
Ensemble Methods
- Coming Soon
-
Cost Sensitive Learning
- Types of cost
- Obtaining the Cost
- Missclassification Cost
- Bayes Risk
- MetaCost
-
Probability Calibration
- Probability Calibration Curves
- Brier Score
- Effect of under and over sampling on Probability Calibration
- Cost Sensitive Learning and Probability Calibration
- Calibrating a Classifier