• Stars
    star
    24,324
  • Rank 905 (Top 0.02 %)
  • Language
  • License
    MIT License
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

πŸ“š Papers & tech blogs by companies sharing their work on data science & machine learning in production.

applied-ml

Curated papers, articles, and blogs on data science & machine learning in production. βš™οΈ

contributions welcome Summaries HitCount

Figuring out how to implement your ML project? Learn how other organizations did it:

  • How the problem is framed πŸ”Ž(e.g., personalization as recsys vs. search vs. sequences)
  • What machine learning techniques worked βœ… (and sometimes, what didn't ❌)
  • Why it works, the science behind it with research, literature, and references πŸ“‚
  • What real-world results were achieved (so you can better assess ROI β°πŸ’°πŸ“ˆ)

P.S., Want a summary of ML advancements? πŸ‘‰ml-surveys

P.P.S, Looking for guides and interviews on applying ML? πŸ‘‰applyingML

Table of Contents

  1. Data Quality
  2. Data Engineering
  3. Data Discovery
  4. Feature Stores
  5. Classification
  6. Regression
  7. Forecasting
  8. Recommendation
  9. Search & Ranking
  10. Embeddings
  11. Natural Language Processing
  12. Sequence Modelling
  13. Computer Vision
  14. Reinforcement Learning
  15. Anomaly Detection
  16. Graph
  17. Optimization
  18. Information Extraction
  19. Weak Supervision
  20. Generation
  21. Audio
  22. Privacy-Preserving Machine Learning
  23. Validation and A/B Testing
  24. Model Management
  25. Efficiency
  26. Ethics
  27. Infra
  28. MLOps Platforms
  29. Practices
  30. Team Structure
  31. Fails

Data Quality

  1. Reliable and Scalable Data Ingestion at Airbnb Airbnb 2016
  2. Monitoring Data Quality at Scale with Statistical Modeling Uber 2017
  3. Data Management Challenges in Production Machine Learning (Paper) Google 2017
  4. Automating Large-Scale Data Quality Verification (Paper)Amazon 2018
  5. Meet Hodor β€” Gojek’s Upstream Data Quality Tool Gojek 2019
  6. Data Validation for Machine Learning (Paper) Google 2019
  7. An Approach to Data Quality for Netflix Personalization Systems Netflix 2020
  8. Improving Accuracy By Certainty Estimation of Human Decisions, Labels, and Raters (Paper) Facebook 2020

Data Engineering

  1. Zipline: Airbnb’s Machine Learning Data Management Platform Airbnb 2018
  2. Sputnik: Airbnb’s Apache Spark Framework for Data Engineering Airbnb 2020
  3. Unbundling Data Science Workflows with Metaflow and AWS Step Functions Netflix 2020
  4. How DoorDash is Scaling its Data Platform to Delight Customers and Meet Growing Demand DoorDash 2020
  5. Revolutionizing Money Movements at Scale with Strong Data Consistency Uber 2020
  6. Zipline - A Declarative Feature Engineering Framework Airbnb 2020
  7. Automating Data Protection at Scale, Part 1 (Part 2) Airbnb 2021
  8. Real-time Data Infrastructure at Uber Uber 2021
  9. Introducing Fabricator: A Declarative Feature Engineering Framework DoorDash 2022
  10. Functions & DAGs: introducing Hamilton, a microframework for dataframe generation Stitch Fix 2021
  11. Optimizing Pinterest’s Data Ingestion Stack: Findings and Learnings Pinterest 2022
  12. Lessons Learned From Running Apache Airflow at Scale Shopify 2022
  13. Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training Meta 2022
  14. Data Mesh β€” A Data Movement and Processing Platform @ Netflix Netflix 2022
  15. Building Scalable Real Time Event Processing with Kafka and FlinkοΏΌ DoorDash 2022

Data Discovery

  1. Apache Atlas: Data Goverance and Metadata Framework for Hadoop (Code) Apache
  2. Collect, Aggregate, and Visualize a Data Ecosystem's Metadata (Code) WeWork
  3. Discovery and Consumption of Analytics Data at Twitter Twitter 2016
  4. Democratizing Data at Airbnb Airbnb 2017
  5. Databook: Turning Big Data into Knowledge with Metadata at Uber Uber 2018
  6. Metacat: Making Big Data Discoverable and Meaningful at Netflix (Code) Netflix 2018
  7. Amundsen β€” Lyft’s Data Discovery & Metadata Engine Lyft 2019
  8. Open Sourcing Amundsen: A Data Discovery And Metadata Platform (Code) Lyft 2019
  9. DataHub: A Generalized Metadata Search & Discovery Tool (Code) LinkedIn 2019
  10. Amundsen: One Year Later Lyft 2020
  11. Using Amundsen to Support User Privacy via Metadata Collection at Square Square 2020
  12. Turning Metadata Into Insights with Databook Uber 2020
  13. DataHub: Popular Metadata Architectures Explained LinkedIn 2020
  14. How We Improved Data Discovery for Data Scientists at Spotify Spotify 2020
  15. How We’re Solving Data Discovery Challenges at Shopify Shopify 2020
  16. Nemo: Data discovery at Facebook Facebook 2020
  17. Exploring Data @ Netflix (Code) Netflix 2021

Feature Stores

  1. Distributed Time Travel for Feature Generation Netflix 2016
  2. Building the Activity Graph, Part 2 (Feature Storage Section) LinkedIn 2017
  3. Fact Store at Scale for Netflix Recommendations Netflix 2018
  4. Zipline: Airbnb’s Machine Learning Data Management Platform Airbnb 2018
  5. Feature Store: The missing data layer for Machine Learning pipelines? Hopsworks 2018
  6. Introducing Feast: An Open Source Feature Store for Machine Learning (Code) Gojek 2019
  7. Michelangelo Palette: A Feature Engineering Platform at Uber Uber 2019
  8. The Architecture That Powers Twitter's Feature Store Twitter 2019
  9. Accelerating Machine Learning with the Feature Store Service CondΓ© Nast 2019
  10. Feast: Bridging ML Models and Data Gojek 2020
  11. Building a Scalable ML Feature Store with Redis, Binary Serialization, and Compression DoorDash 2020
  12. Rapid Experimentation Through Standardization: Typed AI features for LinkedIn’s Feed LinkedIn 2020
  13. Building a Feature Store Monzo Bank 2020
  14. Butterfree: A Spark-based Framework for Feature Store Building (Code) QuintoAndar 2020
  15. Building Riviera: A Declarative Real-Time Feature Engineering Framework DoorDash 2021
  16. Optimal Feature Discovery: Better, Leaner Machine Learning Models Through Information Theory Uber 2021
  17. ML Feature Serving Infrastructure at Lyft Lyft 2021
  18. Near real-time features for near real-time personalization LinkedIn 2022
  19. Building the Model Behind DoorDash’s Expansive Merchant Selection DoorDash 2022
  20. Open sourcing Feathr – LinkedIn’s feature store for productive machine learning LinkedIn 2022
  21. Evolution of ML Fact Store Netflix 2022
  22. Developing scalable feature engineering DAGs Metaflow + Hamilton via Outerbounds 2022
  23. Feature Store Design at Constructor Constructor.io 2023

Classification

  1. Prediction of Advertiser Churn for Google AdWords (Paper) Google 2010
  2. High-Precision Phrase-Based Document Classification on a Modern Scale (Paper) LinkedIn 2011
  3. Chimera: Large-scale Classification using Machine Learning, Rules, and Crowdsourcing (Paper) Walmart 2014
  4. Large-scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks (Paper) NAVER 2016
  5. Learning to Diagnose with LSTM Recurrent Neural Networks (Paper) Google 2017
  6. Discovering and Classifying In-app Message Intent at Airbnb Airbnb 2019
  7. Teaching Machines to Triage Firefox Bugs Mozilla 2019
  8. Categorizing Products at Scale Shopify 2020
  9. How We Built the Good First Issues Feature GitHub 2020
  10. Testing Firefox More Efficiently with Machine Learning Mozilla 2020
  11. Using ML to Subtype Patients Receiving Digital Mental Health Interventions (Paper) Microsoft 2020
  12. Scalable Data Classification for Security and Privacy (Paper) Facebook 2020
  13. Uncovering Online Delivery Menu Best Practices with Machine Learning DoorDash 2020
  14. Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging DoorDash 2020
  15. Deep Learning: Product Categorization and Shelving Walmart 2021
  16. Large-scale Item Categorization for e-Commerce (Paper) DianPing, eBay 2012
  17. Semantic Label Representation with an Application on Multimodal Product Categorization Walmart 2022
  18. Building Airbnb Categories with ML and Human-in-the-Loop Airbnb 2022

Regression

  1. Using Machine Learning to Predict Value of Homes On Airbnb Airbnb 2017
  2. Using Machine Learning to Predict the Value of Ad Requests Twitter 2020
  3. Open-Sourcing Riskquant, a Library for Quantifying Risk (Code) Netflix 2020
  4. Solving for Unobserved Data in a Regression Model Using a Simple Data Adjustment DoorDash 2020

Forecasting

  1. Engineering Extreme Event Forecasting at Uber with RNN Uber 2017
  2. Forecasting at Uber: An Introduction Uber 2018
  3. Transforming Financial Forecasting with Data Science and Machine Learning at Uber Uber 2018
  4. Under the Hood of Gojek’s Automated Forecasting Tool Gojek 2019
  5. BusTr: Predicting Bus Travel Times from Real-Time Traffic (Paper, Video) Google 2020
  6. Retraining Machine Learning Models in the Wake of COVID-19 DoorDash 2020
  7. Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow (Paper, Code) Atlassian 2020
  8. Introducing Orbit, An Open Source Package for Time Series Inference and Forecasting (Paper, Video, Code) Uber 2021
  9. Managing Supply and Demand Balance Through Machine Learning DoorDash 2021
  10. Greykite: A flexible, intuitive, and fast forecasting library LinkedIn 2021
  11. The history of Amazon’s forecasting algorithm Amazon 2021
  12. DeepETA: How Uber Predicts Arrival Times Using Deep Learning Uber 2022
  13. Forecasting Grubhub Order Volume At Scale Grubhub 2022
  14. Causal Forecasting at Lyft (Part 1) Lyft 2022

Recommendation

  1. Amazon.com Recommendations: Item-to-Item Collaborative Filtering (Paper) Amazon 2003
  2. Netflix Recommendations: Beyond the 5 stars (Part 1 (Part 2) Netflix 2012
  3. How Music Recommendation Works β€” And Doesn’t Work Spotify 2012
  4. Learning to Rank Recommendations with the k -Order Statistic Loss (Paper) Google 2013
  5. Recommending Music on Spotify with Deep Learning Spotify 2014
  6. Learning a Personalized Homepage Netflix 2015
  7. The Netflix Recommender System: Algorithms, Business Value, and Innovation (Paper) Netflix 2015
  8. Session-based Recommendations with Recurrent Neural Networks (Paper) Telefonica 2016
  9. Deep Neural Networks for YouTube Recommendations YouTube 2016
  10. E-commerce in Your Inbox: Product Recommendations at Scale (Paper) Yahoo 2016
  11. To Be Continued: Helping you find shows to continue watching on Netflix Netflix 2016
  12. Personalized Recommendations in LinkedIn Learning LinkedIn 2016
  13. Personalized Channel Recommendations in Slack Slack 2016
  14. Recommending Complementary Products in E-Commerce Push Notifications (Paper) Alibaba 2017
  15. Artwork Personalization at Netflix Netflix 2017
  16. A Meta-Learning Perspective on Cold-Start Recommendations for Items (Paper) Twitter 2017
  17. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time (Paper) Pinterest 2017
  18. Powering Search & Recommendations at DoorDash DoorDash 2017
  19. How 20th Century Fox uses ML to predict a movie audience (Paper) 20th Century Fox 2018
  20. Calibrated Recommendations (Paper) Netflix 2018
  21. Food Discovery with Uber Eats: Recommending for the Marketplace Uber 2018
  22. Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits (Paper) Spotify 2018
  23. Talent Search and Recommendation Systems at LinkedIn: Practical Challenges and Lessons Learned (Paper) LinkedIn 2018
  24. Behavior Sequence Transformer for E-commerce Recommendation in Alibaba (Paper) Alibaba 2019
  25. SDM: Sequential Deep Matching Model for Online Large-scale Recommender System (Paper) Alibaba 2019
  26. Multi-Interest Network with Dynamic Routing for Recommendation at Tmall (Paper) Alibaba 2019
  27. Personalized Recommendations for Experiences Using Deep Learning TripAdvisor 2019
  28. Powered by AI: Instagram’s Explore recommender system Facebook 2019
  29. Marginal Posterior Sampling for Slate Bandits (Paper) Netflix 2019
  30. Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations Uber 2019
  31. Music recommendation at Spotify Spotify 2019
  32. Using Machine Learning to Predict what File you Need Next (Part 1) Dropbox 2019
  33. Using Machine Learning to Predict what File you Need Next (Part 2) Dropbox 2019
  34. Learning to be Relevant: Evolution of a Course Recommendation System (PAPER NEEDED)LinkedIn 2019
  35. Temporal-Contextual Recommendation in Real-Time (Paper) Amazon 2020
  36. P-Companion: A Framework for Diversified Complementary Product Recommendation (Paper) Amazon 2020
  37. Deep Interest with Hierarchical Attention Network for Click-Through Rate Prediction (Paper) Alibaba 2020
  38. TPG-DNN: A Method for User Intent Prediction with Multi-task Learning (Paper) Alibaba 2020
  39. PURS: Personalized Unexpected Recommender System for Improving User Satisfaction (Paper) Alibaba 2020
  40. Controllable Multi-Interest Framework for Recommendation (Paper) Alibaba 2020
  41. MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction (Paper) Alibaba 2020
  42. ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation (Paper) Alibaba 2020
  43. For Your Ears Only: Personalizing Spotify Home with Machine Learning Spotify 2020
  44. Reach for the Top: How Spotify Built Shortcuts in Just Six Months Spotify 2020
  45. Contextual and Sequential User Embeddings for Large-Scale Music Recommendation (Paper) Spotify 2020
  46. The Evolution of Kit: Automating Marketing Using Machine Learning Shopify 2020
  47. A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1) LinkedIn 2020
  48. A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2) LinkedIn 2020
  49. Building a Heterogeneous Social Network Recommendation System LinkedIn 2020
  50. How TikTok recommends videos #ForYou ByteDance 2020
  51. Zero-Shot Heterogeneous Transfer Learning from RecSys to Cold-Start Search Retrieval (Paper) Google 2020
  52. Improved Deep & Cross Network for Feature Cross Learning in Web-scale LTR Systems (Paper) Google 2020
  53. Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations (Paper) Google 2020
  54. Future Data Helps Training: Modeling Future Contexts for Session-based Recommendation (Paper) Tencent 2020
  55. A Case Study of Session-based Recommendations in the Home-improvement Domain (Paper) Home Depot 2020
  56. Balancing Relevance and Discovery to Inspire Customers in the IKEA App (Paper) Ikea 2020
  57. How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads Pinterest 2020
  58. Multi-task Learning for Related Products Recommendations at Pinterest Pinterest 2020
  59. Improving the Quality of Recommended Pins with Lightweight Ranking Pinterest 2020
  60. Multi-task Learning and Calibration for Utility-based Home Feed Ranking Pinterest 2020
  61. Personalized Cuisine Filter Based on Customer Preference and Local Popularity DoorDash 2020
  62. How We Built a Matchmaking Algorithm to Cross-Sell Products Gojek 2020
  63. Lessons Learned Addressing Dataset Bias in Model-Based Candidate Generation (Paper) Twitter 2021
  64. Self-supervised Learning for Large-scale Item Recommendations (Paper) Google 2021
  65. Deep Retrieval: End-to-End Learnable Structure Model for Large-Scale Recommendations (Paper) ByteDance 2021
  66. Using AI to Help Health Experts Address the COVID-19 Pandemic Facebook 2021
  67. Advertiser Recommendation Systems at Pinterest Pinterest 2021
  68. On YouTube's Recommendation System YouTube 2021
  69. "Are you sure?": Preliminary Insights from Scaling Product Comparisons to Multiple Shops Coveo 2021
  70. Mozrt, a Deep Learning Recommendation System Empowering Walmart Store Associates Walmart 2021
  71. Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training (Paper) Meta 2021
  72. The Amazon Music conversational recommender is hitting the right notes Amazon 2022
  73. Personalized complementary product recommendation (Paper) Amazon 2022
  74. Building a Deep Learning Based Retrieval System for Personalized Recommendations eBay 2022
  75. How We Built: An Early-Stage Machine Learning Model for Recommendations Peloton 2022
  76. Lessons Learned from Building out Context-Aware Recommender Systems Peloton 2022
  77. Beyond Matrix Factorization: Using hybrid features for user-business recommendations Yelp 2022
  78. Improving job matching with machine-learned activity features LinkedIn 2022
  79. Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training Meta 2022
  80. Blueprints for recommender system architectures: 10th anniversary edition Xavier Amatriain 2022
  81. How Pinterest Leverages Realtime User Actions in Recommendation to Boost Homefeed Engagement Volume Pinterest 2022
  82. RecSysOps: Best Practices for Operating a Large-Scale Recommender System Netflix 2022
  83. Recommend API: Unified end-to-end machine learning infrastructure to generate recommendations Slack 2022
  84. Evolving DoorDash’s Substitution Recommendations Algorithm DoorDash 2022
  85. Homepage Recommendation with Exploitation and Exploration DoorDash 2022
  86. GPU-accelerated ML Inference at Pinterest Pinterest 2022
  87. Addressing Confounding Feature Issue for Causal Recommendation (Paper) Tencent 2022

Search & Ranking

  1. Amazon Search: The Joy of Ranking Products (Paper, Video, Code) Amazon 2016
  2. How Lazada Ranks Products to Improve Customer Experience and Conversion Lazada 2016
  3. Ranking Relevance in Yahoo Search (Paper) Yahoo 2016
  4. Learning to Rank Personalized Search Results in Professional Networks (Paper) LinkedIn 2016
  5. Using Deep Learning at Scale in Twitter’s Timelines Twitter 2017
  6. An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy (Paper) Etsy 2017
  7. Powering Search & Recommendations at DoorDash DoorDash 2017
  8. Applying Deep Learning To Airbnb Search (Paper) Airbnb 2018
  9. In-session Personalization for Talent Search (Paper) LinkedIn 2018
  10. Talent Search and Recommendation Systems at LinkedIn (Paper) LinkedIn 2018
  11. Food Discovery with Uber Eats: Building a Query Understanding Engine Uber 2018
  12. Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search (Paper) Alibaba 2018
  13. Reinforcement Learning to Rank in E-Commerce Search Engine (Paper) Alibaba 2018
  14. Semantic Product Search (Paper) Amazon 2019
  15. Machine Learning-Powered Search Ranking of Airbnb Experiences Airbnb 2019
  16. Entity Personalized Talent Search Models with Tree Interaction Features (Paper) LinkedIn 2019
  17. The AI Behind LinkedIn Recruiter Search and recommendation systems LinkedIn 2019
  18. Learning Hiring Preferences: The AI Behind LinkedIn Jobs LinkedIn 2019
  19. The Secret Sauce Behind Search Personalisation Gojek 2019
  20. Neural Code Search: ML-based Code Search Using Natural Language Queries Facebook 2019
  21. Aggregating Search Results from Heterogeneous Sources via Reinforcement Learning (Paper) Alibaba 2019
  22. Cross-domain Attention Network with Wasserstein Regularizers for E-commerce Search Alibaba 2019
  23. Understanding Searches Better Than Ever Before (Paper) Google 2019
  24. How We Used Semantic Search to Make Our Search 10x Smarter Tokopedia 2019
  25. Query2vec: Search query expansion with query embeddings GrubHub 2019
  26. MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu’s Sponsored Search Baidu 2019
  27. Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? (Paper) Amazon 2020
  28. Managing Diversity in Airbnb Search (Paper) Airbnb 2020
  29. Improving Deep Learning for Airbnb Search (Paper) Airbnb 2020
  30. Quality Matches Via Personalized AI for Hirer and Seeker Preferences LinkedIn 2020
  31. Understanding Dwell Time to Improve LinkedIn Feed Ranking LinkedIn 2020
  32. Ads Allocation in Feed via Constrained Optimization (Paper, Video) LinkedIn 2020
  33. Understanding Dwell Time to Improve LinkedIn Feed Ranking LinkedIn 2020
  34. AI at Scale in Bing Microsoft 2020
  35. Query Understanding Engine in Traveloka Universal Search Traveloka 2020
  36. Bayesian Product Ranking at Wayfair Wayfair 2020
  37. COLD: Towards the Next Generation of Pre-Ranking System (Paper) Alibaba 2020
  38. Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video) Pinterest 2020
  39. Driving Shopping Upsells from Pinterest Search Pinterest 2020
  40. GDMix: A Deep Ranking Personalization Framework (Code) LinkedIn 2020
  41. Bringing Personalized Search to Etsy Etsy 2020
  42. Building a Better Search Engine for Semantic Scholar Allen Institute for AI 2020
  43. Query Understanding for Natural Language Enterprise Search (Paper) Salesforce 2020
  44. Things Not Strings: Understanding Search Intent with Better Recall DoorDash 2020
  45. Query Understanding for Surfacing Under-served Music Content (Paper) Spotify 2020
  46. Embedding-based Retrieval in Facebook Search (Paper) Facebook 2020
  47. Towards Personalized and Semantic Retrieval for E-commerce Search via Embedding Learning (Paper) JD 2020
  48. QUEEN: Neural query rewriting in e-commerce (Paper) Amazon 2021
  49. Using Learning-to-rank to Precisely Locate Where to Deliver Packages (Paper) Amazon 2021
  50. Seasonal relevance in e-commerce search (Paper) Amazon 2021
  51. Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper) Alibaba 2021
  52. How We Built A Context-Specific Bidding System for Etsy Ads Etsy 2021
  53. Pre-trained Language Model based Ranking in Baidu Search (Paper) Baidu 2021
  54. Stitching together spaces for query-based recommendations Stitch Fix 2021
  55. Deep Natural Language Processing for LinkedIn Search Systems (Paper) LinkedIn 2021
  56. Siamese BERT-based Model for Web Search Relevance Ranking (Paper, Code) Seznam 2021
  57. SearchSage: Learning Search Query Representations at Pinterest Pinterest 2021
  58. Query2Prod2Vec: Grounded Word Embeddings for eCommerce Coveo 2021
  59. 3 Changes to Expand DoorDash’s Product Search Beyond Delivery DoorDash 2022
  60. Learning To Rank Diversely Airbnb 2022
  61. How to Optimise Rankings with Cascade Bandits Expedia 2022
  62. A Guide to Google Search Ranking Systems Google 2022
  63. Deep Learning for Search Ranking at Etsy Etsy 2022
  64. Search at Calm Calm 2022

Embeddings

  1. Vector Representation Of Items, Customer And Cart To Build A Recommendation System (Paper) Sears 2017
  2. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (Paper) Alibaba 2018
  3. Embeddings@Twitter Twitter 2018
  4. Listing Embeddings in Search Ranking (Paper) Airbnb 2018
  5. Understanding Latent Style Stitch Fix 2018
  6. Towards Deep and Representation Learning for Talent Search at LinkedIn (Paper) LinkedIn 2018
  7. Personalized Store Feed with Vector Embeddings DoorDash 2018
  8. Should we Embed? A Study on Performance of Embeddings for Real-Time Recommendations(Paper) Moshbit 2019
  9. Machine Learning for a Better Developer Experience Netflix 2020
  10. Announcing ScaNN: Efficient Vector Similarity Search (Paper, Code) Google 2020
  11. BERT Goes Shopping: Comparing Distributional Models for Product Representations Coveo 2021
  12. The Embeddings That Came in From the Cold: Improving Vectors for New and Rare Products with Content-Based Inference Coveo 2022
  13. Embedding-based Retrieval at Scribd Scribd 2021
  14. Multi-objective Hyper-parameter Optimization of Behavioral Song Embeddings (Paper) Apple 2022
  15. Embeddings at Spotify's Scale - How Hard Could It Be? Spotify 2023

Natural Language Processing

  1. Abusive Language Detection in Online User Content (Paper) Yahoo 2016
  2. Smart Reply: Automated Response Suggestion for Email (Paper) Google 2016
  3. Building Smart Replies for Member Messages LinkedIn 2017
  4. How Natural Language Processing Helps LinkedIn Members Get Support Easily LinkedIn 2019
  5. Gmail Smart Compose: Real-Time Assisted Writing (Paper) Google 2019
  6. Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting (Paper) Amazon 2019
  7. Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want Stitch Fix 2019
  8. DeText: A deep NLP Framework for Intelligent Text Understanding (Code) LinkedIn 2020
  9. SmartReply for YouTube Creators Google 2020
  10. Using Neural Networks to Find Answers in Tables (Paper) Google 2020
  11. A Scalable Approach to Reducing Gender Bias in Google Translate Google 2020
  12. Assistive AI Makes Replying Easier Microsoft 2020
  13. AI Advances to Better Detect Hate Speech Facebook 2020
  14. A State-of-the-Art Open Source Chatbot (Paper) Facebook 2020
  15. A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs Facebook 2020
  16. Deep Learning to Translate Between Programming Languages (Paper, Code) Facebook 2020
  17. Deploying Lifelong Open-Domain Dialogue Learning (Paper) Facebook 2020
  18. Introducing Dynabench: Rethinking the way we benchmark AI Facebook 2020
  19. How Gojek Uses NLP to Name Pickup Locations at Scale Gojek 2020
  20. The State-of-the-art Open-Domain Chatbot in Chinese and English (Paper) Baidu 2020
  21. PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization (Paper, Code) Google 2020
  22. Photon: A Robust Cross-Domain Text-to-SQL System (Paper) (Demo) Salesforce 2020
  23. GeDi: A Powerful New Method for Controlling Language Models (Paper, Code) Salesforce 2020
  24. Applying Topic Modeling to Improve Call Center Operations RICOH 2020
  25. WIDeText: A Multimodal Deep Learning Framework Airbnb 2020
  26. Dynaboard: Moving Beyond Accuracy to Holistic Model Evaluation in NLP (Code) Facebook 2021
  27. How we reduced our text similarity runtime by 99.96% Microsoft 2021
  28. Textless NLP: Generating expressive speech from raw audio (Part 1) (Part 2) (Part 3) (Code and Pretrained Models) Facebook 2021
  29. Grammar Correction as You Type, on Pixel 6 Google 2021
  30. Auto-generated Summaries in Google Docs Google 2022
  31. ML-Enhanced Code Completion Improves Developer Productivity Google 2022
  32. Words All the Way Down β€” Conversational Sentiment Analysis PayPal 2022

Sequence Modelling

  1. Doctor AI: Predicting Clinical Events via Recurrent Neural Networks (Paper) Sutter Health 2015
  2. Deep Learning for Understanding Consumer Histories (Paper) Zalando 2016
  3. Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset (Paper) Sutter Health 2016
  4. Continual Prediction of Notification Attendance with Classical and Deep Networks (Paper) Telefonica 2017
  5. Deep Learning for Electronic Health Records (Paper) Google 2018
  6. Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction (Paper)Alibaba 2019
  7. Search-based User Interest Modeling with Sequential Behavior Data for CTR Prediction (Paper) Alibaba 2020
  8. How Duolingo uses AI in every part of its app Duolingo 2020
  9. Leveraging Online Social Interactions For Enhancing Integrity at Facebook (Paper, Video) Facebook 2020
  10. Using deep learning to detect abusive sequences of member activity (Video) LinkedIn 2021

Computer Vision

  1. Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning Dropbox 2017
  2. Categorizing Listing Photos at Airbnb Airbnb 2018
  3. Amenity Detection and Beyond β€” New Frontiers of Computer Vision at Airbnb Airbnb 2019
  4. How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors Deepomatic
  5. Making machines recognize and transcribe conversations in meetings using audio and video Microsoft 2019
  6. Powered by AI: Advancing product understanding and building new shopping experiences Facebook 2020
  7. A Neural Weather Model for Eight-Hour Precipitation Forecasting (Paper) Google 2020
  8. Machine Learning-based Damage Assessment for Disaster Relief (Paper) Google 2020
  9. RepNet: Counting Repetitions in Videos (Paper) Google 2020
  10. Converting Text to Images for Product Discovery (Paper) Amazon 2020
  11. How Disney Uses PyTorch for Animated Character Recognition Disney 2020
  12. Image Captioning as an Assistive Technology (Video) IBM 2020
  13. AI for AG: Production machine learning for agriculture Blue River 2020
  14. AI for Full-Self Driving at Tesla Tesla 2020
  15. On-device Supermarket Product Recognition Google 2020
  16. Using Machine Learning to Detect Deficient Coverage in Colonoscopy Screenings (Paper) Google 2020
  17. Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video) Pinterest 2020
  18. Developing Real-Time, Automatic Sign Language Detection for Video Conferencing (Paper) Google 2020
  19. Vision-based Price Suggestion for Online Second-hand Items (Paper) Alibaba 2020
  20. New AI Research to Help Predict COVID-19 Resource Needs From X-rays (Paper, Model) Facebook 2021
  21. An Efficient Training Approach for Very Large Scale Face Recognition (Paper) Alibaba 2021
  22. Identifying Document Types at Scribd Scribd 2021
  23. Semi-Supervised Visual Representation Learning for Fashion Compatibility (Paper) Walmart 2021
  24. Recognizing People in Photos Through Private On-Device Machine Learning Apple 2021
  25. DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection Google 2022
  26. Contrastive language and vision learning of general fashion concepts (Paper)Coveo 2022
  27. Leveraging Computer Vision for Search Ranking BazaarVoice 2023

Reinforcement Learning

  1. Deep Reinforcement Learning for Sponsored Search Real-time Bidding (Paper) Alibaba 2018
  2. Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising (Paper) Alibaba 2018
  3. Reinforcement Learning for On-Demand Logistics DoorDash 2018
  4. Reinforcement Learning to Rank in E-Commerce Search Engine (Paper) Alibaba 2018
  5. Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning (Paper) Alibaba 2019
  6. Productionizing Deep Reinforcement Learning with Spark and MLflow Zynga 2020
  7. Deep Reinforcement Learning in Production Part1 Part 2 Zynga 2020
  8. Building AI Trading Systems Denny Britz 2020
  9. Shifting Consumption towards Diverse content via Reinforcement Learning (Paper) Spotify 2022
  10. Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms Meta 2022
  11. How to Optimise Rankings with Cascade Bandits Expedia 2022
  12. Selecting the Best Image for Each Merchant Using Exploration and Machine Learning DoorDash 2023

Anomaly Detection

  1. Detecting Performance Anomalies in External Firmware Deployments Netflix 2019
  2. Detecting and Preventing Abuse on LinkedIn using Isolation Forests (Code) LinkedIn 2019
  3. Deep Anomaly Detection with Spark and Tensorflow (Hopsworks Video) Swedbank, Hopsworks 2019
  4. Preventing Abuse Using Unsupervised Learning LinkedIn 2020
  5. The Technology Behind Fighting Harassment on LinkedIn LinkedIn 2020
  6. Uncovering Insurance Fraud Conspiracy with Network Learning (Paper) Ant Financial 2020
  7. How Does Spam Protection Work on Stack Exchange? Stack Exchange 2020
  8. Auto Content Moderation in C2C e-Commerce Mercari 2020
  9. Blocking Slack Invite Spam With Machine Learning Slack 2020
  10. Cloudflare Bot Management: Machine Learning and More Cloudflare 2020
  11. Anomalies in Oil Temperature Variations in a Tunnel Boring Machine SENER 2020
  12. Using Anomaly Detection to Monitor Low-Risk Bank Customers Rabobank 2020
  13. Fighting fraud with Triplet Loss OLX Group 2020
  14. Facebook is Now Using AI to Sort Content for Quicker Moderation (Alternative) Facebook 2020
  15. How AI is getting better at detecting hate speech Part 1, Part 2, Part 3, Part 4 Facebook 2020
  16. Using deep learning to detect abusive sequences of member activity (Video) LinkedIn 2021
  17. Project RADAR: Intelligent Early Fraud Detection System with Humans in the Loop Uber 2022
  18. Graph for Fraud Detection Grab 2022
  19. Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms Meta 2022
  20. Evolving our machine learning to stop mobile bots Cloudflare 2022
  21. Improving the accuracy of our machine learning WAF using data augmentation and sampling Cloudflare 2022
  22. Machine Learning for Fraud Detection in Streaming Services Netflix 2022
  23. Pricing at Lyft Lyft 2022

Graph

  1. Building The LinkedIn Knowledge Graph LinkedIn 2016
  2. Scaling Knowledge Access and Retrieval at Airbnb Airbnb 2018
  3. Graph Convolutional Neural Networks for Web-Scale Recommender Systems (Paper)Pinterest 2018
  4. Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations Uber 2019
  5. AliGraph: A Comprehensive Graph Neural Network Platform (Paper) Alibaba 2019
  6. Contextualizing Airbnb by Building Knowledge Graph Airbnb 2019
  7. Retail Graph β€” Walmart’s Product Knowledge Graph Walmart 2020
  8. Traffic Prediction with Advanced Graph Neural Networks DeepMind 2020
  9. SimClusters: Community-Based Representations for Recommendations (Paper, Video) Twitter 2020
  10. Metapaths guided Neighbors aggregated Network for Heterogeneous Graph Reasoning (Paper) Alibaba 2021
  11. Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper) Alibaba 2021
  12. JEL: Applying End-to-End Neural Entity Linking in JPMorgan Chase (Paper) JPMorgan Chase 2021
  13. How AWS uses graph neural networks to meet customer needs Amazon 2022
  14. Graph for Fraud Detection Grab 2022

Optimization

  1. Matchmaking in Lyft Line (Part 1) (Part 2) (Part 3) Lyft 2016
  2. The Data and Science behind GrabShare Carpooling (Part 1) (PAPER NEEDED) Grab 2017
  3. How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats Uber 2018
  4. Next-Generation Optimization for Dasher Dispatch at DoorDash DoorDash 2020
  5. Optimization of Passengers Waiting Time in Elevators Using Machine Learning Thyssen Krupp AG 2020
  6. Think Out of The Package: Recommending Package Types for E-commerce Shipments (Paper) Amazon 2020
  7. Optimizing DoorDash’s Marketing Spend with Machine Learning DoorDash 2020
  8. Using learning-to-rank to precisely locate where to deliver packages (Paper)Amazon 2021

Information Extraction

  1. Unsupervised Extraction of Attributes and Their Values from Product Description (Paper) Rakuten 2013
  2. Using Machine Learning to Index Text from Billions of Images Dropbox 2018
  3. Extracting Structured Data from Templatic Documents (Paper) Google 2020
  4. AutoKnow: self-driving knowledge collection for products of thousands of types (Paper, Video) Amazon 2020
  5. One-shot Text Labeling using Attention and Belief Propagation for Information Extraction (Paper) Alibaba 2020
  6. Information Extraction from Receipts with Graph Convolutional Networks Nanonets 2021

Weak Supervision

  1. Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (Paper) Google 2019
  2. Osprey: Weak Supervision of Imbalanced Extraction Problems without Code (Paper) Intel 2019
  3. Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper) Apple 2019
  4. Bootstrapping Conversational Agents with Weak Supervision (Paper) IBM 2019

Generation

  1. Better Language Models and Their Implications (Paper)OpenAI 2019
  2. Image GPT (Paper, Code) OpenAI 2019
  3. Language Models are Few-Shot Learners (Paper) (GPT-3 Blog post) OpenAI 2020
  4. Deep Learned Super Resolution for Feature Film Production (Paper) Pixar 2020
  5. Unit Test Case Generation with Transformers Microsoft 2021

Audio

  1. Improving On-Device Speech Recognition with VoiceFilter-Lite (Paper)Google 2020
  2. The Machine Learning Behind Hum to Search Google 2020

Privacy-preserving Machine Learning

  1. Federated Learning: Collaborative Machine Learning without Centralized Training Data (Paper) Google 2017
  2. Federated Learning with Formal Differential Privacy Guarantees (Paper) Google 2022
  3. MPC-based machine learning: Achieving end-to-end privacy-preserving machine learning (Paper) Facebook 2022

Validation and A/B Testing

  1. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation (Paper) Google 2010
  2. The Reusable Holdout: Preserving Validity in Adaptive Data Analysis (Paper) Google 2015
  3. Twitter Experimentation: Technical Overview Twitter 2015
  4. It’s All A/Bout Testing: The Netflix Experimentation Platform Netflix 2016
  5. Building Pinterest’s A/B Testing Platform Pinterest 2016
  6. Experimenting to Solve Cramming Twitter 2017
  7. Building an Intelligent Experimentation Platform with Uber Engineering Uber 2017
  8. Scaling Airbnb’s Experimentation Platform Airbnb 2017
  9. Meet Wasabi, an Open Source A/B Testing Platform (Code) Intuit 2017
  10. Analyzing Experiment Outcomes: Beyond Average Treatment Effects Uber 2018
  11. Under the Hood of Uber’s Experimentation Platform Uber 2018
  12. Constrained Bayesian Optimization with Noisy Experiments (Paper) Facebook 2018
  13. Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab Grab 2018
  14. Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions (Code) Better 2019
  15. Detecting Interference: An A/B Test of A/B Tests LinkedIn 2019
  16. Announcing a New Framework for Designing Optimal Experiments with Pyro (Paper) (Paper) Uber 2020
  17. Enabling 10x More Experiments with Traveloka Experiment Platform Traveloka 2020
  18. Large Scale Experimentation at Stitch Fix (Paper) Stitch Fix 2020
  19. Multi-Armed Bandits and the Stitch Fix Experimentation Platform Stitch Fix 2020
  20. Experimentation with Resource Constraints Stitch Fix 2020
  21. Computational Causal Inference at Netflix (Paper) Netflix 2020
  22. Key Challenges with Quasi Experiments at Netflix Netflix 2020
  23. Making the LinkedIn experimentation engine 20x faster LinkedIn 2020
  24. Our Evolution Towards T-REX: The Prehistory of Experimentation Infrastructure at LinkedIn LinkedIn 2020
  25. How to Use Quasi-experiments and Counterfactuals to Build Great Products Shopify 2020
  26. Improving Experimental Power through Control Using Predictions as Covariate DoorDash 2020
  27. Supporting Rapid Product Iteration with an Experimentation Analysis Platform DoorDash 2020
  28. Improving Online Experiment Capacity by 4X with Parallelization and Increased Sensitivity DoorDash 2020
  29. Leveraging Causal Modeling to Get More Value from Flat Experiment Results DoorDash 2020
  30. Iterating Real-time Assignment Algorithms Through Experimentation DoorDash 2020
  31. Spotify’s New Experimentation Platform (Part 1) (Part 2) Spotify 2020
  32. Interpreting A/B Test Results: False Positives and Statistical Significance Netflix 2021
  33. Interpreting A/B Test Results: False Negatives and Power Netflix 2021
  34. Running Experiments with Google Adwords for Campaign Optimization DoorDash 2021
  35. The 4 Principles DoorDash Used to Increase Its Logistics Experiment Capacity by 1000% DoorDash 2021
  36. Experimentation Platform at Zalando: Part 1 - Evolution Zalando 2021
  37. Designing Experimentation Guardrails Airbnb 2021
  38. How Airbnb Measures Future Value to Standardize Tradeoffs Airbnb 2021
  39. Network Experimentation at Scale(Paper] Facebook 2021
  40. Universal Holdout Groups at Disney Streaming Disney 2021
  41. Experimentation is a major focus of Data Science across Netflix Netflix 2022
  42. Search Journey Towards Better Experimentation Practices Spotify 2022
  43. Artificial Counterfactual Estimation: Machine Learning-Based Causal Inference at Airbnb Airbnb 2022
  44. Beyond A/B Test : Speeding up Airbnb Search Ranking Experimentation through Interleaving Airbnb 2022
  45. Challenges in Experimentation Lyft 2022
  46. Overtracking and Trigger Analysis: Reducing sample sizes while INCREASING sensitivity Booking 2022
  47. Meet Dash-AB β€” The Statistics Engine of Experimentation at DoorDash DoorDash 2022
  48. Comparing quantiles at scale in online A/B-testing Spotify 2022
  49. Accelerating our A/B experiments with machine learning Dropbox 2023
  50. Supercharging A/B Testing at Uber Uber

Model Management

  1. Operationalizing Machine Learningβ€”Managing Provenance from Raw Data to Predictions Comcast 2018
  2. Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper) Apple 2019
  3. Runway - Model Lifecycle Management at Netflix Netflix 2020
  4. Managing ML Models @ Scale - Intuit’s ML Platform Intuit 2020
  5. ML Model Monitoring - 9 Tips From the Trenches Nubank 2021
  6. Dealing with Train-serve Skew in Real-time ML Models: A Short Guide Nubank 2023

Efficiency

  1. GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce (Paper) Facebook 2020
  2. How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs Roblox 2020
  3. Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks (Paper) Uber 2021
  4. GPU-accelerated ML Inference at Pinterest Pinterest 2022

Ethics

  1. Building Inclusive Products Through A/B Testing (Paper) LinkedIn 2020
  2. LiFT: A Scalable Framework for Measuring Fairness in ML Applications (Paper) LinkedIn 2020
  3. Introducing Twitter’s first algorithmic bias bounty challenge Twitter 2021
  4. Examining algorithmic amplification of political content on Twitter Twitter 2021
  5. A closer look at how LinkedIn integrates fairness into its AI products LinkedIn 2022

Infra

  1. Reengineering Facebook AI’s Deep Learning Platforms for Interoperability Facebook 2020
  2. Elastic Distributed Training with XGBoost on Ray Uber 2021

MLOps Platforms

  1. Meet Michelangelo: Uber’s Machine Learning Platform Uber 2017
  2. Operationalizing Machine Learningβ€”Managing Provenance from Raw Data to Predictions Comcast 2018
  3. Big Data Machine Learning Platform at Pinterest Pinterest 2019
  4. Core Modeling at Instagram Instagram 2019
  5. Open-Sourcing Metaflow - a Human-Centric Framework for Data Science Netflix 2019
  6. Managing ML Models @ Scale - Intuit’s ML Platform Intuit 2020
  7. Real-time Machine Learning Inference Platform at Zomato Zomato 2020
  8. Introducing Flyte: Cloud Native Machine Learning and Data Processing Platform Lyft 2020
  9. Building Flexible Ensemble ML Models with a Computational Graph DoorDash 2021
  10. LyftLearn: ML Model Training Infrastructure built on Kubernetes Lyft 2021
  11. "You Don't Need a Bigger Boat": A Full Data Pipeline Built with Open-Source Tools (Paper) Coveo 2021
  12. MLOps at GreenSteam: Shipping Machine Learning GreenSteam 2021
  13. Evolving Reddit’s ML Model Deployment and Serving Architecture Reddit 2021
  14. Redesigning Etsy’s Machine Learning Platform Etsy 2021
  15. Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training (Paper) Meta 2021
  16. Building a Platform for Serving Recommendations at Etsy Etsy 2022
  17. Intelligent Automation Platform: Empowering Conversational AI and Beyond at Airbnb Airbnb 2022
  18. DARWIN: Data Science and Artificial Intelligence Workbench at LinkedIn LinkedIn 2022
  19. The Magic of Merlin: Shopify's New Machine Learning Platform Shopify 2022
  20. Zalando's Machine Learning Platform Zalando 2022
  21. Inside Meta's AI optimization platform for engineers across the company (Paper) Meta 2022
  22. Monzo’s machine learning stack Monzo 2022
  23. Evolution of ML Fact Store Netflix 2022
  24. Using MLOps to Build a Real-time End-to-End Machine Learning Pipeline Binance 2022
  25. Serving Machine Learning Models Efficiently at Scale at Zillow Zillow 2022
  26. Didact AI: The anatomy of an ML-powered stock picking engine Didact AI 2022
  27. Deployment for Free - A Machine Learning Platform for Stitch Fix's Data Scientists Stitch Fix 2022
  28. Machine Learning Operations (MLOps): Overview, Definition, and Architecture (Paper) IBM 2022

Practices

  1. Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper) Yoshua Bengio 2012
  2. Machine Learning: The High Interest Credit Card of Technical Debt (Paper) (Paper) Google 2014
  3. Rules of Machine Learning: Best Practices for ML Engineering Google 2018
  4. On Challenges in Machine Learning Model Management Amazon 2018
  5. Machine Learning in Production: The Booking.com Approach Booking 2019
  6. 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper) Booking 2019
  7. Successes and Challenges in Adopting Machine Learning at Scale at a Global Bank Rabobank 2019
  8. Challenges in Deploying Machine Learning: a Survey of Case Studies (Paper) Cambridge 2020
  9. Reengineering Facebook AI’s Deep Learning Platforms for Interoperability Facebook 2020
  10. The problem with AI developer tools for enterprises Databricks 2020
  11. Continuous Integration and Deployment for Machine Learning Online Serving and Models Uber 2021
  12. Tuning Model Performance Uber 2021
  13. Maintaining Machine Learning Model Accuracy Through Monitoring DoorDash 2021
  14. Building Scalable and Performant Marketing ML Systems at Wayfair Wayfair 2021
  15. Our approach to building transparent and explainable AI systems LinkedIn 2021
  16. 5 Steps for Building Machine Learning Models for Business Shopify 2021
  17. Data Is An Art, Not Just A Scienceβ€”And Storytelling Is The Key Shopify 2022
  18. Best Practices for Real-time Machine Learning: Alerting Nubank 2022
  19. Automatic Retraining for Machine Learning Models: Tips and Lessons Learned Nubank 2022
  20. RecSysOps: Best Practices for Operating a Large-Scale Recommender System Netflix 2022
  21. ML Education at Uber: Frameworks Inspired by Engineering Principles Uber 2022

Team structure

  1. What is the most effective way to structure a data science team? Udemy 2017
  2. Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department Stitch Fix 2016
  3. Building The Analytics Team At Wish Wish 2018
  4. Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist Stitch Fix 2019
  5. Cultivating Algorithms: How We Grow Data Science at Stitch Fix Stitch Fix
  6. Analytics at Netflix: Who We Are and What We Do Netflix 2020
  7. Building a Data Team at a Mid-stage Startup: A Short Story Erikbern 2021
  8. A Behind-the-Scenes Look at How Postman’s Data Team Works Postman 2021
  9. Data Scientist x Machine Learning Engineer Roles: How are they different? How are they alike? Nubank 2022

Fails

  1. When It Comes to Gorillas, Google Photos Remains Blind Google 2018
  2. 160k+ High School Students Will Graduate Only If a Model Allows Them to International Baccalaureate 2020
  3. An Algorithm That β€˜Predicts’ Criminality Based on a Face Sparks a Furor Harrisburg University 2020
  4. It's Hard to Generate Neural Text From GPT-3 About Muslims OpenAI 2020
  5. A British AI Tool to Predict Violent Crime Is Too Flawed to Use United Kingdom 2020
  6. More in awful-ai
  7. AI Incident Database Partnership on AI 2022

P.S., Want a summary of ML advancements? Get up to speed with survey papers πŸ‘‰ml-surveys

More Repositories

1

open-llms

πŸ“‹ A list of open LLMs available for commercial use.
10,867
star
2

ml-surveys

πŸ“‹ Survey papers summarizing advances in deep learning, NLP, CV, graphs, reinforcement learning, recommendations, graphs, etc.
2,630
star
3

ml-design-docs

πŸ“ Design doc template & examples for machine learning systems (requirements, methodology, implementation, etc.)
395
star
4

1-on-1s

🌱 1-on-1 questions and resources from my time as a manager.
310
star
5

testing-ml

πŸ” Minimal examples of machine learning tests for implementation, behaviour, and performance.
Python
199
star
6

obsidian-copilot

πŸ€– A prototype assistant for writing and thinking
Python
186
star
7

applyingml

πŸ“Œ Papers, guides, and mentor interviews on applying machine learning for ApplyingML.comβ€”the ghost knowledge of machine learning.
JavaScript
160
star
8

papermill-mlflow

πŸ§ͺ Simple data science experimentation & tracking with jupyter, papermill, and mlflow.
Jupyter Notebook
152
star
9

python-collab-template

πŸ›  Python project template with unit tests, code coverage, linting, type checking, Makefile wrapper, and GitHub Actions.
Python
129
star
10

recsys-nlp-graph

πŸ›’ Simple recommender with matrix factorization, graph, and NLP. Beating the regular collaborative filtering baseline.
Python
112
star
11

llm-paper-notes

Notes from the Latent Space paper club. Follow along or start your own!
73
star
12

fastapi-html

Sample repository demonstrating how to use FastAPI to serve HTML web apps.
Python
62
star
13

eugeneyan

Python
38
star
14

poc-docker-template

Simple template showing how to set up docker for reproducible data science with Jupyter notebooks.
Jupyter Notebook
21
star
15

text-to-image

Jupyter Notebook
13
star
16

nocode-ml

😝 End-to-end machine learning; "no code" required!
12
star
17

discord-llm

Experimenting with LLMs to Research, Reflect, and Plan (LLM assistants, retrieval, and Discord integration)
Jupyter Notebook
11
star
18

learning-typescript

JavaScript
10
star
19

design-patterns

Java
7
star
20

deep-rl

Repository for deep reinforcement learning with OpenAI
Python
6
star
21

testing-pipelines

Python
6
star
22

kaggle_springleaf

Code for Kaggle Springleaf Email Prediction Challenge
Python
5
star
23

Computational-Thinking-and-Data-Science

edX: Introduction to Computational Thinking and Data Science (Oct 2014)
Python
5
star
24

ama

Ask Me Anything
4
star
25

Mining-Massive-Datasets

Coursera: Mining Massive Datasets (Sep 2014)
R
4
star
26

Time-Series-Analysis

Simple forecasting with Regression Model
R
3
star
27

raspberry-llm

Calling LLM APIs on a Raspberry Pi for lulz
Python
3
star
28

Statistical-Inference

This repository contains the lab assignments for the facilitation of John Hopkins University' Coursera MOOC on Statistical Inference.
R
3
star
29

kaggle_titanic

Code for Kaggle Titanic Challenge (and other learning)
HTML
3
star
30

Statistical-Learning

Stanford OpenX: Introduction to Statistical Learning
HTML
3
star
31

Data-Analysis-and-Statistical-Inference-Project

Coursera: Data Analysis & Statistical Inference Project (Feb 2014)
R
2
star
32

neural_networks_and_deep_learning

2
star
33

Twitter-SMA

Twitter Streaming and Analysis with Python and R
R
2
star
34

scratch

Jupyter Notebook
2
star
35

Getting-and-Cleaning-Data

Coursera: Getting and Cleaning Data (May 2014)
R
2
star
36

Computer-Science-and-Programming-In-Python

edX: Introduction to Computer Science and Programming in Python (July 2014)
Python
1
star
37

Misc

R
1
star
38

datagene

Jupyter Notebook
1
star
39

Interactive-Programming-in-Python

Coursera: Interactive Programming in Python (Apr 2014)
Python
1
star
40

R-Programming

Coursera: R Programming (May 2014)
R
1
star
41

Visualizations

Random Visualizations
R
1
star
42

json-to-utterances

Jupyter Notebook
1
star
43

DKSG-HOME

Sharing my R script used in the DKSG DataLearn for home
R
1
star
44

eugeneyan-comments

1
star
45

kaggle_otto

Code for Kaggle Otto Production Classification Challenge
R
1
star
46

Demand-Forecasting

Prototyping various forecasting techniques
R
1
star
47

Machine-Learning

Coursera: Machine Learning (Aug 2014)
MATLAB
1
star