• Stars
    star
    709
  • Rank 63,849 (Top 2 %)
  • Language
  • License
    MIT License
  • Created over 2 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Compilation of high-profile real-world examples of failed machine learning projects

Failed Machine Learning (FML)

High-profile real-world examples of failed machine learning projects


“Success is not final, failure is not fatal. It is the courage to continue that counts.” - Winston Churchill


If you are looking for examples of how ML can fail despite all its incredible potential, you have come to the right place. Beyond the wonderful success stories of applied machine learning, here is a list of failed projects which we can learn a lot from.

Contributions Welcome!


Contents

  1. Classical Machine Learning
  2. Computer Vision
  3. Forecasting
  4. Natural Language Processing
  5. Recommendation Systems

Classical Machine Learning

Title Description
Amazon AI Recruitment System AI-powered automated recruitment system canceled after evidence of discrimination against female candidates
Genderify - Gender identification tool AI-powered tool designed to identify gender based on fields like name and email address was shut down due to built-in biases and inaccuracies
Leakage and the Reproducibility Crisis in ML-based Science A team at Princeton University found 20 reviews across 17 scientific fields that discovered significant errors (e.g., data leakage, no train-test split) in 329 papers that use ML-based science
COVID-19 Diagnosis and Triage Models Hundreds of predictive models were developed to diagnose or triage COVID-19 patients faster, but ultimately none of them were fit for clinical use, and some were potentially harmful
COMPAS Recidivism Algorithm Florida’s recidivism risk system found evidence of racial bias
Pennsylvania Child Welfare Screening Tool The predictive algorithm (which helps identify which families are to be investigated by social workers for child abuse and neglect) flagged a disproportionate number of Black children for 'mandatory' neglect investigations.
Oregon Child Welfare Screening Tool A similar predictive tool to the one in Pennsylvania, the AI algorithm for child welfare in Oregon was also stopped a month after the Pennsylvania report
U.S. Healthcare System Health Risk Prediction A widely used algorithm to predict healthcare needs exhibited racial bias where for a given risk score, black patients are considerably sicker than white patients
Apple Card Credit Card Apple’s new credit card (created in partnership with Goldman Sachs) is being investigated by financial regulators after customers complained that the card’s lending algorithms discriminated against women, where the credit line offered by a male customer's Apple Card was 20 times higher than that offered to his spouse

Computer Vision

Title Description
Inverness Automated Football Camera System AI camera football-tracking technology for live streaming repeatedly confused a linesman’s bald head for the ball itself
Amazon Rekognition for US Congressmen Amazon's facial recognition technology (Rekognition) falsely matched 28 congresspeople with mugshots of criminals, while also revealing racial bias in the algorithm
Amazon Rekognition for law enforcement Amazon's facial recognition technology (Rekognition) misidentified women as men, particularly those with darker skin
Zhejiang traffice facial recognition system Traffic camera system (designed to capture traffic offenses) mistook a face on the side of a bus as someone who jaywalked
Kneron tricking facial recognition terminals The team at Kneron used high-quality 3-D masks to deceive AliPay and WeChat payment systems to make purchases
Twitter smart cropping tool Twitter's auto-crop tool for photo review displayed evident signs of racial bias
Depixelator tool Algorithm (based on StyleGAN) designed to generate depixelated faces showed signs of racial bias, with image output skewed towards the white demographic
Google Photos tagging The automatic photo tagging capability in Google Photos mistakenly labeled black people as gorillas
GenderShades evaluation of gender classification products GenderShades' research revealed that Microsoft and IBM’s face-analysis services for identifying the gender of people in photos frequently erred when analyzing images of women with dark skin
New Jersey Police Facial Recognition A false facial recognition match by New Jersey police landed an innocent black man (Nijeer Parks) in jail even though he was 30 miles away from the crime
Tesla's dilemma between a horse cart and a truck Tesla's visualization system got confused by mistaking a horse carriage as a truck with a man walking behind it
Google's AI for Diabetic Retinopathy Detection The retina scanning tool fared much worse in real-life settings than in controlled experiments, with issues such as rejected scans (from poor scan image quality) and delays from intermittent internet connectivity when uploading images to the cloud for processing

Forecasting

Title Description
Google Flu Trends Flu prevalence prediction model based on Google searches produced inaccurate over-estimates
Zillow iBuying algorithms Significant losses in Zillow's home-flipping business due to inaccurate (overestimated) prices from property valuation models
Tyndaris Robot Hedge Fund AI-powered automated trading system controlled by a supercomputer named K1 resulted in big investment losses, culminating in a lawsuit
Sentient Investment AI Hedge Fund The once high flying AI-powered fund at Sentient Investment Management failed to make money and was promptly liquidated in less than 2 years

Natural Language Processing

Title Description
Microsoft Tay Chatbot Chatbot that posted inflammatory and offensive tweets through its Twitter account
Nabla Chatbot Experimental chatbot (for medical advice) using a cloud-hosted instance of GPT-3 advised a mock patient to commit suicide
Facebook Negotiation Chatbots The AI system was shut down after the chatbots stopped using English in their negotiations and started using a language that they created by themselves
OpenAI GPT-3 Chatbot Samantha A GPT-3 chatbot fine-tuned by indie game developer Jason Rohrer to emulate his dead fiancée was shut down by OpenAI after Jason refused their request to insert an automated monitoring tool amidst concerns of the chatbot being racist or overtly sexual
Amazon Alexa plays porn Amazon's voice-activated digital assistant unleashed a torrent of raunchy language after a toddler asked it to play a children’s song.
Galactica - Meta's Large Language Model A problem with Galactica was that it could not distinguish truth from falsehood, a basic requirement for a language model designed to generate scientific text. It was found to make up fake papers (sometimes attributing them to real authors), and generated articles about the history of bears in space as readily as ones about protein complexes.
Energy Firm in Voice Mimicry Fraud Cybercriminals used AI-based software to impersonate the voice of a CEO to demand a fraudulent money transfer as part of the voice-spoofing attack
MOH chatbot dispenses safe sex advice when asked Covid-19 questions The 'Ask Jamie' chatbot by the Singapore Ministry of Health (MOH) was temporarily disabled after it provided misaligned replies around safe sex when asked about managing positive COVID-19 results
Google's BARD Chatbot Demo In its first public demo advertisement, BARD made a factual error regarding which satellite first took pictures of a planet outside the Earth's solar system.
ChatGPT Categories of Failures An analysis of the ten categories of failures seen in ChatGPT so far, including reasoning, factual errors, math, coding, and bias.
TikTokers roasting McDonald's hilarious drive-thru AI order fails Some samples where a production/deployed voice assistant fails to get orders right and leads to brand/reputation damage for McDonalds
Bing Chatbot's Unhinged Emotional Behavior In certain conversations, Bing's chatbot was found to reply with argumentative and emotional responses
Bing's AI quotes COVID disinformation sourced from ChatGPT Bing's response to a query on COVID-19 anti-vaccine advocacy was inaccurate and based on false information from unreliable sources
AI-generated 'Seinfeld' suspended on Twitch for transphobic jokes A mistake with the AI’s content filter resulted in the character 'Larry' delivering a transphobic standup routine.
ChatGPT cites bogus legal cases A lawyer used OpenAI's popular chatbot ChatGPT to "supplement" his own findings but was provided with completely manufactured previous cases that do not exist

Recommendation Systems

Title Description
IBM's Watson Health IBM’s Watson allegedly provided numerous unsafe and incorrect recommendations for treating cancer patients
Netflix - $1 Million Challenge The recommender system that won the $1 Million challenge improved the proposed baseline by 8.43%. However, this performance gain did not seem to justify the engineering effort needed to bring it into a production environment.

More Repositories

1

Llama-2-Open-Source-LLM-CPU-Inference

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A
Python
947
star
2

AWS-Certified-Cloud-Practitioner-Notes

Notes compiled based on AWS E-Learning lessons and transcripts
853
star
3

Neural-Network-Architecture-Diagrams

Diagrams for visualizing neural network architecture (Created with diagrams.net)
699
star
4

MLOps-Specialization-Notes

Notes for Machine Learning Engineering for Production (MLOps) Specialization course by DeepLearning.AI & Andrew Ng
347
star
5

Generative-AI-Pharmacist

Generative AI Pharmacist (For Demo Purposes Only)
76
star
6

End-to-End-AutoML-Insurance

An End-to-End Implementation of AutoML with H2O, MLflow, FastAPI, and Streamlit for Insurance Cross-Sell
Jupyter Notebook
72
star
7

Pyvis-Network-Graph-Streamlit

Deploying Pyvis Interactive Network Graphs in Streamlit
HTML
52
star
8

Drug-Interactions-Network-Analysis-and-Visualization

Network analysis and visualization of drug-drug interactions with NetworkX and Pyvis
Jupyter Notebook
31
star
9

OCR-Metrics-CER-WER

Sample implementation of OCR metrics (CER, WER) calculation with TesseractOCR and fastwer
Jupyter Notebook
28
star
10

Logistic-Regression-Assumptions

Assumptions of Logistic Regression, Clearly Explained
Jupyter Notebook
27
star
11

kennethleungty

Data Science Portfolio
25
star
12

Image-Metadata-Exif

Read and modify image metadata in Python with exif
Jupyter Notebook
24
star
13

Wikipedia-Scraping-with-LLM-Agents

Scraping Wikipedia by combining LangChain's agents and tools with OpenAI's LLMs and function calling
Jupyter Notebook
24
star
14

Anomaly-Detection-Pipeline-Kedro

Anomaly Detection Pipeline with Isolation Forest model and Kedro framework
Python
22
star
15

Data-Centric-AI-Competition

Codes for a Top 5% finish in the Data-Centric AI Competition organized by Andrew Ng and DeepLearning.AI
Jupyter Notebook
21
star
16

Car-Plate-Detection-OpenCV-TesseractOCR

Russian Car License Plate Detection with OpenCV and TesseractOCR in Python
Jupyter Notebook
17
star
17

FIFA-Football-World-Rankings

Analyzing FIFA World Football Rankings with Python and R
Jupyter Notebook
16
star
18

Text-to-Audio-with-Bark

Exploring Bark, the Open-Source Text-to-Audio Generative Model
Jupyter Notebook
15
star
19

Singapore-Condo-Rental-Market-Analysis

Singapore Condo Rental Prices - From Data Acquisition to Prediction
Jupyter Notebook
13
star
20

DataWig-Missing-Data-Imputation

Imputation of Missing Data in Tables
Jupyter Notebook
12
star
21

Fortune-Global-500-Bar-Chart-Race

Using Python and Flourish to visualize rank and revenue trends of the world’s largest companies
Jupyter Notebook
12
star
22

PyTorch-Ignite-Tiny-ImageNet-Classification

Tiny ImageNet Classification Exercise with PyTorch
Jupyter Notebook
12
star
23

Credit-Card-Fraud-Detection-AutoXGB

Utilizing AutoXGB for Credit Card Financial Fraud Detection
Jupyter Notebook
12
star
24

Keyword-Analysis-with-KeyBERT-and-Taipy

Keyword Extraction and Analysis Pipeline & Application with KeyBERT and Taipy
Python
12
star
25

Simulated-Annealing-Feature-Selection

Feature Selection using Simulated Annealing
Jupyter Notebook
11
star
26

AWS-RDS-MySQL-Python

Integrating Amazon RDS, MySQL Workbench, and PyMySQL to build and deploy a database on the cloud
Jupyter Notebook
11
star
27

English-Premier-League-VAR-Analysis

Analyzing Video Assistant Referee (VAR) decisions in the English Premier League (2019 - 2021)
Jupyter Notebook
11
star
28

Principal-Component-Regression

Principal Component Regression - Clearly Explained and Implemented
Jupyter Notebook
11
star
29

Alcohol-Image-Classifier-fastai

Utilizing fastai to classify images of various types of alcoholic beverages
Jupyter Notebook
10
star
30

COVID19-Vaccine-Sentiment-Analysis

Sentiment Analysis of COVID-19 Vaccine-related Twitter Data
Jupyter Notebook
10
star
31

Image-Augmentation-Libraries

Sample implementation codes for a variety of popular image augmentation Python packages
Jupyter Notebook
8
star
32

Domain-LLMs

Comprehensive Compilation of Customized LLMs for Specific Domains and Industries
7
star
33

ChatPod

ChatPod - Q&A over your Podcasts
Jupyter Notebook
7
star
34

PyMySQL-Demo

PyMySQL - Connecting Python and SQL for Data Science
Jupyter Notebook
7
star
35

TensorFlow-Transfer-Learning-Image-Classification

Practical Guide to Transfer Learning in TensorFlow for Multiclass Image Classification
Jupyter Notebook
7
star
36

StatsAssume

Automating Assumption Checks for Regression Models (Work in Progress, Currently Paused)
Python
6
star
37

Responsible-AI-Masterclass

Responsible AI Masterclass (June 2024 Run)
5
star
38

Common-Python-Codes

A list of common Python commands I use for data wrangling
Jupyter Notebook
4
star
39

Exploring-Illegal-Drugs

Exploratory data analysis of the counterfeit drugs as reported in Singapore by HSA
Jupyter Notebook
3
star
40

Web-Scraping-Walkthrough-HCP-Info

Web scraping script (with Python and Selenium) to automatically compile list of licensed healthcare professionals along with their respective public details
Jupyter Notebook
3
star
41

Post-Vaccine-Timer

Post-COVID-19 Vaccine Timer
HTML
2
star
42

ODE-Modelling-with-Differential-Evolution

Pharmacokinetic modelling of drug concentration trajectories with ordinary differential equations and differential evolution
Jupyter Notebook
2
star
43

KgBase-Drug-Side-Effect-Analysis

1
star