Data Science
Collection of useful data science topics along with articles and videos.
Subscribe to:
How to Download the Code in This Repository to Your Local Machine
To download the code in this repo, you can simply use git clone
git clone https://github.com/khuyentran1401/Data-science
Contents
- MLOps
- Data Management Tools
- Testing
- Productive Tools
- Python Helper Tools
- Tools for Deployment
- Speed-up Tools
- Math Tools
- Machine Learning
- Natural Language Processing
- Computer Vision
- Time Series
- Feature Engineering
- Visualization
- Mathematical Programming
- Scraping
- Python
- Terminal
- Linear Algebra
- Data Structure
- Statistics
- Web Applications
- Share Insights
- Cool Tools
- Learning Tips
- Productive Tips
- VSCode
- Book Review
- Data Science Portfolio
MLOps
Title |
Article |
Repository |
Video |
Stop Hard Coding in a Data Science Project – Use Configuration Files Instead |
🔗 |
🔗 |
🔗 |
Poetry: A Better Way to Manage Python Dependencies |
🔗 |
|
🔗 |
Git for Data Scientists: Learn Git through Practical Examples |
🔗 |
|
🔗 |
Introduction to Weight & Biases: Track and Visualize your Machine Learning Experiments in 3 Lines of Code |
🔗 |
🔗 |
|
Kedro — A Python Framework for Reproducible Data Science Project |
🔗 |
🔗 |
|
Orchestrate a Data Science Project in Python With Prefect |
🔗 |
🔗 |
|
Orchestrate Your Data Science Project with Prefect 2.0 |
🔗 |
🔗 |
🔗 |
DagsHub: a GitHub Supplement for Data Scientists and ML Engineers |
🔗 |
🔗 |
|
4 pre-commit Plugins to Automate Code Reviewing and Formatting in Python |
🔗 |
🔗 |
🔗 |
BentoML: Create an ML Powered Prediction Service in Minutes |
🔗 |
🔗 |
🔗 |
How to Structure a Data Science Project for Maintainability (with DVC) |
🔗 |
🔗 |
🔗 |
How to Structure an ML Project for Reproducibility and Maintainability (with Prefect) |
🔗 |
🔗 |
|
GitHub Actions in MLOps: Automatically Check and Deploy Your ML Model |
🔗 |
🔗 |
|
Create Robust Data Pipelines with Prefect, Docker, and GitHub |
🔗 |
🔗 |
|
Create a Maintainable Data Pipeline with Prefect and DVC |
🔗 |
🔗 |
|
Build a Full-Stack ML Application With Pydantic And Prefect |
🔗 |
🔗 |
🔗 |
Streamline Code Updates with DVC and GitHub Actions |
🔗 |
🔗 |
🔗 |
Create Observable and Reproducible Notebooks with Hex |
🔗 |
🔗 |
🔗 |
Build Reliable Machine Learning Pipelines with Continuous Integration |
🔗 |
🔗 |
🔗 |
Automate Machine Learning Deployment with GitHub Actions |
🔗 |
🔗 |
🔗 |
Data Management Tools
Title |
Article |
Repository |
Video |
Introduction to DVC: Data Version Control Tool for Machine Learning Projects |
🔗 |
🔗 |
🔗 |
Great Expectations: Always Know What to Expect From Your Data |
🔗 |
🔗 |
|
Validate Your pandas DataFrame with Pandera |
🔗 |
🔗 |
🔗 |
Introduction to Schema: A Python Libary to Validate your Data |
🔗 |
🔗 |
|
How to Create Fake Data with Faker |
🔗 |
🔗 |
|
Hypothesis and Pandera: Generate Synthesis Pandas DataFrame for Testing |
🔗 |
🔗 |
🔗 |
What is dbt (data build tool) and When should you use it? |
🔗 |
🔗 |
🔗 |
Streamline dbt Model Development with Notebook-Style Workspace |
🔗 |
🔗 |
🔗 |
Testing
Title |
Article |
Repository |
Video |
Pytest for Data Scientists |
🔗 |
🔗 |
🔗 |
4 Lessor-Known Yet Awesome Tips for Pytest |
🔗 |
🔗 |
|
DeepDiff — Recursively Find and Ignore Trivial Differences Using Python |
🔗 |
🔗 |
|
Checklist — Behavioral Testing of NLP Models |
🔗 |
🔗 |
|
Detect Defects in a Data Pipeline Early with Validation and Notifications |
🔗 |
🔗 |
🔗 |
Write Readable Tests for Your Machine Learning Models with Behave |
🔗 |
🔗 |
🔗 |
Productive Tools
Title |
Article |
Repository |
3 Tools to Track and Visualize the Execution of your Python Code |
🔗 |
🔗 |
2 Tools to Automatically Reload when Python Files Change |
🔗 |
🔗 |
3 Ways to Get Notified with Python |
🔗 |
🔗 |
How to Create Reusable Command-Line |
🔗 |
|
How to Strip Outputs and Execute Interactive Code in a Python Script |
🔗 |
🔗 |
Sending Slack Notifications in Python with Prefect |
🔗 |
🔗 |
Python Helper Tools
Title |
Article |
Repository |
Video |
Pydash: A Kitchen Sink of Missing Python Utilities |
🔗 |
🔗 |
|
Write Clean Python Code Using Pipes |
🔗 |
🔗 |
🔗 |
Introducing FugueSQL — SQL for Pandas, Spark, and Dask DataFrames |
🔗 |
🔗 |
|
Fugue and DuckDB: Fast SQL Code in Python |
🔗 |
🔗 |
|
Simplify Data Science Workflows on BigQuery with Fugue and Python |
🔗 |
🔗 |
|
Tools for Deployment
Title |
Article |
Repository |
How to Effortlessly Publish your Python Package to PyPI Using Poetry |
🔗 |
🔗 |
Typer: Build Powerful CLIs in One Line of Code using Python |
🔗 |
🔗 |
Speed-up Tools
Title |
Article |
Repository |
Cython-A Speed-Up Tool for your Python Function |
🔗 |
🔗 |
Train your Machine Learning Model 150x Faster with cuML |
🔗 |
🔗 |
Math Tools
Title |
Article |
Repository |
SymPy: Symbolic Computation in Python |
🔗 |
🔗 |
Machine Learning
Title |
Article |
Repository |
Video |
How to Monitor And Log your Machine Learning Experiment Remotely with HyperDash |
🔗 |
🔗 |
|
How to Efficiently Fine-Tune your Machine Learning Models |
🔗 |
🔗 |
|
How to Learn Non-linear Dataset with Support Vector Machines |
🔗 |
🔗 |
|
Introduction to IBM Federated Learning: A Collaborative Approach to Train ML Models on Private Data |
🔗 |
🔗 |
|
3 Steps to Improve your Efficiency when Hypertuning ML Models |
🔗 |
|
|
human-learn: Create a Human Learning Model by Drawing |
🔗 |
🔗 |
|
Patsy: Build Powerful Features with Arbitrary Python Code |
🔗 |
🔗 |
|
SHAP: Explain Any Machine Learning Model in Python |
🔗 |
🔗 |
|
Predict Movie Ratings with User-Based Collaborative Filtering |
🔗 |
🔗 |
|
River: Online Machine Learning in Python |
🔗 |
🔗 |
🔗 |
Human-Learn: Rule-Based Learning as an Alternative to Machine Learning |
🔗 |
🔗 |
🔗 |
Natural Language Processing
Title |
Article |
Repository |
Video |
Sentiment Analysis of LinkedIn Messages |
🔗 |
🔗 |
|
Find Common Words in Article with Python Module Newspaper and NLTK |
🔗 |
🔗 |
|
How to Tokenize Tweets with Python |
🔗 |
🔗 |
|
How to Solve Analogies with Word2Vec |
🔗 |
🔗 |
|
What is PyTorch |
🔗 |
🔗 |
|
Convolutional Neural Network in Natural Language Processing |
🔗 |
🔗 |
|
Supercharge your Python String with TextBlob |
🔗 |
🔗 |
🔗 |
pyLDAvis: Topic Modelling Exploration Tool That Every NLP Data Scientist Should Know |
🔗 |
🔗 |
|
Streamlit and spaCy: Create an App to Predict Sentiment and Word Similarities with Minimal Domain Knowledge |
🔗 |
🔗 |
|
Build a Robust Conversational Assistant with Rasa |
🔗 |
🔗 |
|
I Analyzed 2k Data Scientist and Data Engineer Jobs and This is What I Found |
🔗 |
🔗 |
|
Checklist — Behavioral Testing of NLP Models |
🔗 |
🔗 |
|
PRegEx: Write Human-Readable Regular Expressions in Python |
🔗 |
🔗 |
|
Texthero: Text Preprocessing, Representation, and Visualization for a pandas DataFrame |
🔗 |
🔗 |
|
Computer Vision
Title |
Article |
Repository |
How to Create an App to Classify Dogs Using fastai and Streamlit |
🔗 |
🔗 |
Time Series
Title |
Article |
Repository |
Kats: a Generalizable Framework to Analyze Time Series Data in Python |
🔗 |
🔗 |
How to Detect Seasonality, Outliers, and Changepoints in Your Time Series |
🔗 |
🔗 |
4 Tools to Automatically Extract Data from Datetime in Python |
🔗 |
🔗 |
Feature Engineering
Title |
Article |
Repository |
Video |
3 Ways to Extract Features from Dates with Python |
🔗 |
🔗 |
|
Similarity Encoding for Dirty Categories Using dirty_cat |
🔗 |
🔗 |
|
Snorkel — A Human-In-The-Loop Platform to Build Training Data |
🔗 |
🔗 |
🔗 |
Visualization
Title |
Article |
Repository |
Video |
How to Embed Interactive Charts on your Articles and Personal Website |
🔗 |
🔗 |
|
What I Learned from Scraping 15k Data Science Articles on Medium |
🔗 |
🔗 |
|
How to Create Interactive Plots with Altair |
🔗 |
🔗 |
|
How to Create a Drop-Down Menu and a Slide Bar for your Favorite Visualization Tool |
🔗 |
🔗 |
|
I Scraped more than 1k Top Machine Learning Github Profiles and this is what I Found |
🔗 |
🔗 |
|
Top 6 Python Libraries for Visualization: Which one to Use? |
🔗 |
🔗 |
|
Introduction to Yellowbrick: A Python Library to Visualize the Prediction of your Machine Learning Model |
🔗 |
🔗 |
|
Visualize Gender-Specific Tweets with Scattertext |
🔗 |
🔗 |
|
Visualize Your Team’s Projects Using Python Gantt Chart |
🔗 |
🔗 |
|
How to Create Bindings and Conditions Between Multiple Plots Using Altair |
🔗 |
🔗 |
|
How to Sketch your Data Science Ideas With Excalidraw |
🔗 |
|
|
Pyvis: Visualize Interactive Network Graphs in Python |
🔗 |
🔗 |
🔗 |
Build and Analyze Knowledge Graphs with Diffbot |
🔗 |
|
|
Observe The Friend Paradox in Facebook Data Using Python |
🔗 |
🔗 |
|
What skills and backgrounds do data scientists have in common? |
🔗 |
🔗 |
|
Visualize Similarities Between Companies With Graph Database |
🔗 |
🔗 |
|
Visualize GitHub Social Network with PyGraphistry |
🔗 |
🔗 |
|
Find the Top Bootcamps for Data Professionals From Over 5k Profiles |
🔗 |
🔗 |
|
floWeaver — Turn Flow Data Into a Sankey Diagram In Python |
🔗 |
🔗 |
|
atoti — Build a BI Platform in Python |
🔗 |
🔗 |
|
Analyze and Visualize URLs with Network Graph |
🔗 |
🔗 |
|
statsannotations: Add Statistical Significance Annotations on Seaborn Plots |
🔗 |
🔗 |
🔗 |
Mathematical Programming
Title |
Article |
Repository |
How to choose stocks to invest in with Python |
🔗 |
🔗 |
Maximize your Productivity with Python |
🔗 |
🔗 |
How to Find a Good Match with Python |
🔗 |
🔗 |
How to Solve a Staff Scheduling Problem with Python |
🔗 |
🔗 |
How to Find Best Locations for your Restaurants with Python |
🔗 |
🔗 |
How to Schedule Flights in Python |
🔗 |
🔗 |
How to Solve a Production Planning and Inventory Problem in Python |
🔗 |
🔗 |
Scraping
Title |
Article |
Repository |
Web Scrape Movie Database with Beautiful Soup |
🔗 |
🔗 |
top-github-scraper: Scrape Top Github Users and Repositories Based On a Keyword in One Line of Code |
🔗 |
🔗 |
Python
Title |
Article |
Repository |
Video |
Numpy Tricks for your Data Science Projects |
🔗 |
🔗 |
|
Timing for Efficient Python Code |
🔗 |
🔗 |
|
How to Use Lambda for Efficient Python Code |
🔗 |
🔗 |
|
Python Tricks for Keeping Track of Your Data |
🔗 |
🔗 |
|
Boost Your Efficiency With Specialized Dictionary Implementations in Python |
🔗 |
🔗 |
|
Dictionary as an Alternative to If-Else |
🔗 |
🔗 |
|
How to Use Zip to Manipulate a List of Tuples |
🔗 |
🔗 |
|
Get the Most out of Your Array With These Four Numpy Methods |
🔗 |
🔗 |
|
3 Python Tricks to Read, Create, and Run Multiple Files Automatically |
🔗 |
🔗 |
|
How to Exclude the Outliers in Pandas DataFrame |
🔗 |
🔗 |
|
Python Clean Code: 6 Best Practices to Make Your Python Functions More Readable |
🔗 |
🔗 |
🔗 |
3 Techniques to Effortlessly Import and Execute Python Modules |
🔗 |
🔗 |
|
Simplify Your Functions with Functools’ Partial and Singledispatch |
🔗 |
🔗 |
|
Terminal
Title |
Article |
Repository |
How to Create and View Interactive Cheatsheets on the Command-line |
🔗 |
|
Understand CSV Files from your Terminal with XSV |
🔗 |
|
Prettify your Terminal Text With Termcolor and Pyfiglet |
🔗 |
🔗 |
Stop Using Print to Debug in Python. Use Icecream Instead |
🔗 |
|
Rich: Generate Rich and Beautiful Text in the Terminal with Python |
🔗 |
🔗 |
Create a Beautiful Dashboard in your Terminal with Wtfutil |
🔗 |
🔗 |
3 Tools to Monitor and Optimize your Linux System |
🔗 |
|
Ptpython: A Better Python REPL |
🔗 |
🔗 |
fd: a Simple but Powerful Tool to Find and Execute Files on the Command Line |
🔗 |
|
Speed Up your Command-Line Navigation with These 3 Tools |
🔗 |
|
Python and Data Science Snippets on the Command Line |
🔗 |
🔗 |
Statistics
Title |
Article |
Repository |
Can Datasets of a Dinosaur and a Circle have Identical Statistics? |
🔗 |
🔗 |
Introduction to One-Way ANOVA: A Test to Compare the Means between More than Two Groups |
🔗 |
🔗 |
Bayes’ Theorem, Clearly Explained with Visualization |
🔗 |
🔗 |
Detect Change Points with Bayesian Inference and PyMC3 |
🔗 |
🔗 |
Bayesian Linear Regression with Bambi |
🔗 |
🔗 |
Earn More Salary as a Coder — Higher Degree or More Years of Experience? |
🔗 |
🔗 |
Linear Algebra
Title |
Article |
Repository |
How to Build a Matrix Module from Scratch |
🔗 |
🔗 |
Linear Algebra for Machine Learning: Solve a System of Linear Equations |
🔗 |
🔗 |
Data Structure
Title |
Article |
Repository |
Convex Hull: An Innovative Approach to Gift-Wrap your Data |
🔗 |
🔗 |
How to Visualize Social Network With Graph Theory |
🔗 |
🔗 |
How to Search Data with KDTree |
🔗 |
🔗 |
How to Find the Nearest Hospital with a Voronoi Diagram |
🔗 |
🔗 |
Web Applications
Title |
Article |
Repository |
How to Create an Interactive Startup Growth Calculator with Python |
🔗 |
🔗 |
Streamlit and spaCy: Create an App to Predict Sentiment and Word Similarities with Minimal Domain Knowledge |
🔗 |
🔗 |
PyWebIO: Write Interactive Web App in Script Way Using Python |
🔗 |
🔗 |
PyWebIO 1.3.0: Add Tabs, Pin Input, and Update an Input Based on Another Input |
🔗 |
🔗 |
Create an App to Deal with Boredom Using PyWebIO |
🔗 |
🔗 |
Build a Robust Workflow to Visualize Trending GitHub Repositories in Python |
🔗 |
🔗 |
Share Insights
Title |
Article |
Repository |
Introduction to Datapane: A Python Library to Build Interactive Reports |
🔗 |
|
Datapane’s New Features: Create a Beautiful Dashboard in Python in a Few Lines of Code |
🔗 |
🔗 |
Introduction to Datasette: Explore and Publish Your Data in One Line of Code |
🔗 |
|
How to Share your Python Objects Across Different Environments in One Line of Code |
🔗 |
🔗 |
How to Share your Jupyter Notebook in 3 Lines of Code with Ngrok |
🔗 |
|
Introduction to Deepnote: Real-time Collaboration on Jupyter Notebook |
🔗 |
|
Cool Tools
Title |
Article |
Repository |
Simulate Real-life Events in Python Using SimPy |
🔗 |
🔗 |
How to Create Mathematical Animations like 3Blue1Brown Using Python |
🔗 |
🔗 |
Learning Tips
Title |
Article |
Repository |
How to Learn Data Science when Life does not Give You a Break |
🔗 |
|
How to Accelerate your Data Science Career by Putting yourself in the Right Environment |
🔗 |
|
To become a Better Data Scientist, you need to Think like a Programmer |
🔗 |
|
How not to be Overwhelmed with Data Science |
🔗 |
|
Productive Tips
Title |
Article |
Repository |
How to Organize your Data Science Articles with Github |
🔗 |
🔗 |
5 Reasons why you should Switch from Jupyter Notebook to Scripts |
🔗 |
|
7 Reasons Why you Should Start Documenting your Code |
🔗 |
|
VSCode
Title |
Article |
Repository |
How to Leverage Visual Studio Code for your Data Science Projects |
🔗 |
|
Top 4 Code Viewers for Data Scientist in VSCode |
🔗 |
|
Incorporate the Best Practices for Python with These Top 4 VSCode Extensions |
🔗 |
|
Boost Your Efficiency with Customized Code Snippets on VSCode |
🔗 |
|
Top 9 Keyboard Shortcuts in VSCode for Data Scientists |
🔗 |
|
Book Review
Title |
Article |
Repository |
Python Machine Learning: A Comprehensive Handbook for Machine Learning |
🔗 |
|
Data Science Portfolio
Title |
Article |
Repository |
How to Create an Elegant Website for your Data Science Portfolio in 10 minutes |
🔗 |
|
Build an Impressive Github Profile in 3 Steps |
🔗 |
|
Supporters
Special thanks to these supporters for supporting this project!