• Stars
    star
    3,988
  • Rank 10,631 (Top 0.3 %)
  • Language
    Jupyter Notebook
  • Created almost 4 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Collection of useful data science topics along with articles, videos, and code

Data Science

View on GitHub Daily Data Science Tips View on YouTube

Collection of useful data science topics along with articles and videos.

Subscribe to:

How to Download the Code in This Repository to Your Local Machine

To download the code in this repo, you can simply use git clone

git clone https://github.com/khuyentran1401/Data-science

Contents

  1. MLOps
  2. Data Management Tools
  3. Testing
  4. Productive Tools
  5. Python Helper Tools
  6. Tools for Deployment
  7. Speed-up Tools
  8. Math Tools
  9. Machine Learning
  10. Natural Language Processing
  11. Computer Vision
  12. Time Series
  13. Feature Engineering
  14. Visualization
  15. Mathematical Programming
  16. Scraping
  17. Python
  18. Terminal
  19. Linear Algebra
  20. Data Structure
  21. Statistics
  22. Web Applications
  23. Share Insights
  24. Cool Tools
  25. Learning Tips
  26. Productive Tips
  27. VSCode
  28. Book Review
  29. Data Science Portfolio

MLOps

Title Article Repository Video
Stop Hard Coding in a Data Science Project – Use Configuration Files Instead πŸ”— πŸ”— πŸ”—
Poetry: A Better Way to Manage Python Dependencies πŸ”— πŸ”—
Git for Data Scientists: Learn Git through Practical Examples πŸ”— πŸ”—
Introduction to Weight & Biases: Track and Visualize your Machine Learning Experiments in 3 Lines of Code πŸ”— πŸ”—
Kedro β€” A Python Framework for Reproducible Data Science Project πŸ”— πŸ”—
Orchestrate a Data Science Project in Python With Prefect πŸ”— πŸ”—
Orchestrate Your Data Science Project with Prefect 2.0 πŸ”— πŸ”— πŸ”—
DagsHub: a GitHub Supplement for Data Scientists and ML Engineers πŸ”— πŸ”—
4 pre-commit Plugins to Automate Code Reviewing and Formatting in Python πŸ”— πŸ”— πŸ”—
BentoML: Create an ML Powered Prediction Service in Minutes πŸ”— πŸ”— πŸ”—
How to Structure a Data Science Project for Maintainability (with DVC) πŸ”— πŸ”— πŸ”—
How to Structure an ML Project for Reproducibility and Maintainability (with Prefect) πŸ”— πŸ”—
GitHub Actions in MLOps: Automatically Check and Deploy Your ML Model πŸ”— πŸ”—
Create Robust Data Pipelines with Prefect, Docker, and GitHub πŸ”— πŸ”—
Create a Maintainable Data Pipeline with Prefect and DVC πŸ”— πŸ”—
Build a Full-Stack ML Application With Pydantic And Prefect πŸ”— πŸ”— πŸ”—
Streamline Code Updates with DVC and GitHub Actions πŸ”— πŸ”— πŸ”—
Create Observable and Reproducible Notebooks with Hex πŸ”— πŸ”— πŸ”—
Build Reliable Machine Learning Pipelines with Continuous Integration πŸ”— πŸ”— πŸ”—
Automate Machine Learning Deployment with GitHub Actions πŸ”— πŸ”— πŸ”—

Data Management Tools

Title Article Repository Video
Introduction to DVC: Data Version Control Tool for Machine Learning Projects πŸ”— πŸ”— πŸ”—
Great Expectations: Always Know What to Expect From Your Data πŸ”— πŸ”—
Validate Your pandas DataFrame with Pandera πŸ”— πŸ”— πŸ”—
Introduction to Schema: A Python Libary to Validate your Data πŸ”— πŸ”—
How to Create Fake Data with Faker πŸ”— πŸ”—
Hypothesis and Pandera: Generate Synthesis Pandas DataFrame for Testing πŸ”— πŸ”— πŸ”—
What is dbt (data build tool) and When should you use it? πŸ”— πŸ”— πŸ”—
Streamline dbt Model Development with Notebook-Style Workspace πŸ”— πŸ”— πŸ”—

Testing

Title Article Repository Video
Pytest for Data Scientists πŸ”— πŸ”— πŸ”—
4 Lessor-Known Yet Awesome Tips forΒ Pytest πŸ”— πŸ”—
DeepDiff β€” Recursively Find and Ignore Trivial Differences Using Python πŸ”— πŸ”—
Checklist β€” Behavioral Testing of NLP Models πŸ”— πŸ”—
Detect Defects in a Data Pipeline Early with Validation and Notifications πŸ”— πŸ”— πŸ”—
Write Readable Tests for Your Machine Learning Models with Behave πŸ”— πŸ”— πŸ”—

Productive Tools

Title Article Repository
3 Tools to Track and Visualize the Execution of your Python Code πŸ”— πŸ”—
2 Tools to Automatically Reload when Python Files Change πŸ”— πŸ”—
3 Ways to Get Notified with Python πŸ”— πŸ”—
How to Create Reusable Command-Line πŸ”—
How to Strip Outputs and Execute Interactive Code in a Python Script πŸ”— πŸ”—
Sending Slack Notifications in Python with Prefect πŸ”— πŸ”—

Python Helper Tools

Title Article Repository Video
Pydash: A Kitchen Sink of Missing Python Utilities πŸ”— πŸ”—
Write Clean Python Code Using Pipes πŸ”— πŸ”— πŸ”—
Introducing FugueSQL β€” SQL for Pandas, Spark, and Dask DataFrames πŸ”— πŸ”—
Fugue and DuckDB: Fast SQL Code in Python πŸ”— πŸ”—
Simplify Data Science Workflows on BigQuery with Fugue and Python πŸ”— πŸ”—

Tools for Deployment

Title Article Repository
How to Effortlessly Publish your Python Package to PyPI Using Poetry πŸ”— πŸ”—
Typer: Build Powerful CLIs in One Line of Code using Python πŸ”— πŸ”—

Speed-up Tools

Title Article Repository
Cython-A Speed-Up Tool for your Python Function πŸ”— πŸ”—
Train your Machine Learning Model 150x Faster with cuML πŸ”— πŸ”—

Math Tools

Title Article Repository
SymPy: Symbolic Computation in Python πŸ”— πŸ”—

Machine Learning

Title Article Repository Video
How to Monitor And Log your Machine Learning Experiment Remotely with HyperDash πŸ”— πŸ”—
How to Efficiently Fine-Tune your Machine Learning Models πŸ”— πŸ”—
How to Learn Non-linear Dataset with Support Vector Machines πŸ”— πŸ”—
Introduction to IBM Federated Learning: A Collaborative Approach to Train ML Models on Private Data πŸ”— πŸ”—
3 Steps to Improve your Efficiency when Hypertuning ML Models πŸ”—
human-learn: Create a Human Learning Model by Drawing πŸ”— πŸ”—
Patsy: Build Powerful Features with Arbitrary Python Code πŸ”— πŸ”—
SHAP: Explain Any Machine Learning Model in Python πŸ”— πŸ”—
Predict Movie Ratings with User-Based Collaborative Filtering πŸ”— πŸ”—
River: Online Machine Learning in Python πŸ”— πŸ”— πŸ”—
Human-Learn: Rule-Based Learning as an Alternative to Machine Learning πŸ”— πŸ”— πŸ”—

Natural Language Processing

Title Article Repository Video
Sentiment Analysis of LinkedInΒ Messages πŸ”— πŸ”—
Find Common Words in Article with Python Module Newspaper and NLTK πŸ”— πŸ”—
How to Tokenize Tweets with Python πŸ”— πŸ”—
How to Solve Analogies with Word2Vec πŸ”— πŸ”—
What is PyTorch πŸ”— πŸ”—
Convolutional Neural Network in Natural Language Processing πŸ”— πŸ”—
Supercharge your Python String with TextBlob πŸ”— πŸ”— πŸ”—
pyLDAvis: Topic Modelling Exploration Tool That Every NLP Data Scientist Should Know πŸ”— πŸ”—
Streamlit and spaCy: Create an App to Predict Sentiment and Word Similarities with Minimal Domain Knowledge πŸ”— πŸ”—
Build a Robust Conversational Assistant with Rasa πŸ”— πŸ”—
I Analyzed 2k Data Scientist and Data Engineer Jobs and This is What I Found πŸ”— πŸ”—
Checklist β€” Behavioral Testing of NLP Models πŸ”— πŸ”—
PRegEx: Write Human-Readable Regular Expressions in Python πŸ”— πŸ”—
Texthero: Text Preprocessing, Representation, and Visualization for a pandas DataFrame πŸ”— πŸ”—

Computer Vision

Title Article Repository
How to Create an App to Classify Dogs Using fastai and Streamlit πŸ”— πŸ”—

Time Series

Title Article Repository
Kats: a Generalizable Framework to Analyze Time Series Data in Python πŸ”— πŸ”—
How to Detect Seasonality, Outliers, and Changepoints in Your Time Series πŸ”— πŸ”—
4 Tools to Automatically Extract Data from Datetime in Python πŸ”— πŸ”—

Feature Engineering

Title Article Repository Video
3 Ways to Extract Features from Dates with Python πŸ”— πŸ”—
Similarity Encoding for Dirty Categories Using dirty_cat πŸ”— πŸ”—
Snorkel β€” A Human-In-The-Loop Platform to Build Training Data πŸ”— πŸ”— πŸ”—

Visualization

Title Article Repository Video
How to Embed Interactive Charts on your Articles and Personal Website πŸ”— πŸ”—
What I Learned from Scraping 15k Data Science Articles on Medium πŸ”— πŸ”—
How to Create Interactive Plots with Altair πŸ”— πŸ”—
How to Create a Drop-Down Menu and a Slide Bar for your Favorite Visualization Tool πŸ”— πŸ”—
I Scraped more than 1k Top Machine Learning Github Profiles and this is what I Found πŸ”— πŸ”—
Top 6 Python Libraries for Visualization: Which one to Use? πŸ”— πŸ”—
Introduction to Yellowbrick: A Python Library to Visualize the Prediction of your Machine Learning Model πŸ”— πŸ”—
Visualize Gender-Specific Tweets with Scattertext πŸ”— πŸ”—
Visualize Your Team’s Projects Using Python Gantt Chart πŸ”— πŸ”—
How to Create Bindings and Conditions Between Multiple Plots Using Altair πŸ”— πŸ”—
How to Sketch your Data Science Ideas With Excalidraw πŸ”—
Pyvis: Visualize Interactive Network Graphs in Python πŸ”— πŸ”— πŸ”—
Build and Analyze Knowledge Graphs with Diffbot πŸ”—
Observe The Friend Paradox in Facebook Data Using Python πŸ”— πŸ”—
What skills and backgrounds do data scientists have in common? πŸ”— πŸ”—
Visualize Similarities Between Companies With Graph Database πŸ”— πŸ”—
Visualize GitHub Social Network with PyGraphistry πŸ”— πŸ”—
Find the Top Bootcamps for Data Professionals From Over 5k Profiles πŸ”— πŸ”—
floWeaver β€” Turn Flow Data Into a Sankey Diagram In Python πŸ”— πŸ”—
atoti β€” Build a BI Platform in Python πŸ”— πŸ”—
Analyze and Visualize URLs with Network Graph πŸ”— πŸ”—
statsannotations: Add Statistical Significance Annotations on Seaborn Plots πŸ”— πŸ”— πŸ”—

Mathematical Programming

Title Article Repository
How to choose stocks to invest in with Python πŸ”— πŸ”—
Maximize your Productivity with Python πŸ”— πŸ”—
How to Find a Good Match with Python πŸ”— πŸ”—
How to Solve a Staff Scheduling Problem with Python πŸ”— πŸ”—
How to Find Best Locations for your Restaurants with Python πŸ”— πŸ”—
How to Schedule Flights in Python πŸ”— πŸ”—
How to Solve a Production Planning and Inventory Problem in Python πŸ”— πŸ”—

Scraping

Title Article Repository
Web Scrape Movie Database with Beautiful Soup πŸ”— πŸ”—
top-github-scraper: Scrape Top Github Users and Repositories Based On a Keyword in One Line of Code πŸ”— πŸ”—

Python

Title Article Repository Video
Numpy Tricks for your Data Science Projects πŸ”— πŸ”—
Timing for Efficient Python Code πŸ”— πŸ”—
How to Use Lambda for Efficient Python Code πŸ”— πŸ”—
Python Tricks for Keeping Track of Your Data πŸ”— πŸ”—
Boost Your Efficiency With Specialized Dictionary Implementations in Python πŸ”— πŸ”—
Dictionary as an Alternative to If-Else πŸ”— πŸ”—
How to Use Zip to Manipulate a List of Tuples πŸ”— πŸ”—
Get the Most out of Your Array With These Four Numpy Methods πŸ”— πŸ”—
3 Python Tricks to Read, Create, and Run Multiple Files Automatically πŸ”— πŸ”—
How to Exclude the Outliers in Pandas DataFrame πŸ”— πŸ”—
Python Clean Code: 6 Best Practices to Make Your Python Functions More Readable πŸ”— πŸ”— πŸ”—
3 Techniques to Effortlessly Import and Execute Python Modules πŸ”— πŸ”—
Simplify Your Functions with Functools’ Partial and Singledispatch πŸ”— πŸ”—

Terminal

Title Article Repository
How to Create and View Interactive Cheatsheets on the Command-line πŸ”—
Understand CSV Files from your Terminal with XSV πŸ”—
Prettify your Terminal Text With Termcolor and Pyfiglet πŸ”— πŸ”—
Stop Using Print to Debug in Python. Use Icecream Instead πŸ”—
Rich: Generate Rich and Beautiful Text in the Terminal with Python πŸ”— πŸ”—
Create a Beautiful Dashboard in your Terminal with Wtfutil πŸ”— πŸ”—
3 Tools to Monitor and Optimize your Linux System πŸ”—
Ptpython: A Better Python REPL πŸ”— πŸ”—
fd: a Simple but Powerful Tool to Find and Execute Files on the Command Line πŸ”—
Speed Up your Command-Line Navigation with These 3 Tools πŸ”—
Python and Data Science Snippets on the Command Line πŸ”— πŸ”—

Statistics

Title Article Repository
Can Datasets of a Dinosaur and a Circle have Identical Statistics? πŸ”— πŸ”—
Introduction to One-Way ANOVA: A Test to Compare the Means between More than Two Groups πŸ”— πŸ”—
Bayes’ Theorem, Clearly Explained with Visualization πŸ”— πŸ”—
Detect Change Points with Bayesian Inference and PyMC3 πŸ”— πŸ”—
Bayesian Linear Regression with Bambi πŸ”— πŸ”—
Earn More Salary as a Coder β€” Higher Degree or More Years of Experience? πŸ”— πŸ”—

Linear Algebra

Title Article Repository
How to Build a Matrix Module from Scratch πŸ”— πŸ”—
Linear Algebra for Machine Learning: Solve a System of Linear Equations πŸ”— πŸ”—

Data Structure

Title Article Repository
Convex Hull: An Innovative Approach to Gift-Wrap your Data πŸ”— πŸ”—
How to Visualize Social Network With Graph Theory πŸ”— πŸ”—
How to Search Data with KDTree πŸ”— πŸ”—
How to Find the Nearest Hospital with a Voronoi Diagram πŸ”— πŸ”—

Web Applications

Title Article Repository
How to Create an Interactive Startup Growth Calculator with Python πŸ”— πŸ”—
Streamlit and spaCy: Create an App to Predict Sentiment and Word Similarities with Minimal Domain Knowledge πŸ”— πŸ”—
PyWebIO: Write Interactive Web App in Script Way Using Python πŸ”— πŸ”—
PyWebIO 1.3.0: Add Tabs, Pin Input, and Update an Input Based on Another Input πŸ”— πŸ”—
Create an App to Deal with Boredom Using PyWebIO πŸ”— πŸ”—
Build a Robust Workflow to Visualize Trending GitHub Repositories in Python πŸ”— πŸ”—

Share Insights

Title Article Repository
Introduction to Datapane: A Python Library to Build Interactive Reports πŸ”—
Datapane’s New Features: Create a Beautiful Dashboard in Python in a Few Lines of Code πŸ”— πŸ”—
Introduction to Datasette: Explore and Publish Your Data in One Line of Code πŸ”—
How to Share your Python Objects Across Different Environments in One Line of Code πŸ”— πŸ”—
How to Share your Jupyter Notebook in 3 Lines of Code with Ngrok πŸ”—
Introduction to Deepnote: Real-time Collaboration on Jupyter Notebook πŸ”—

Cool Tools

Title Article Repository
Simulate Real-life Events in Python Using SimPy πŸ”— πŸ”—
How to Create Mathematical Animations like 3Blue1Brown Using Python πŸ”— πŸ”—

Learning Tips

Title Article Repository
How to Learn Data Science when Life does not Give You a Break πŸ”—
How to Accelerate your Data Science Career by Putting yourself in the Right Environment πŸ”—
To become a Better Data Scientist, you need to Think like a Programmer πŸ”—
How not to be Overwhelmed with Data Science πŸ”—

Productive Tips

Title Article Repository
How to Organize your Data Science Articles with Github πŸ”— πŸ”—
5 Reasons why you should Switch from Jupyter Notebook to Scripts πŸ”—
7 Reasons Why you Should Start Documenting your Code πŸ”—

VSCode

Title Article Repository
How to Leverage Visual Studio Code for your Data Science Projects πŸ”—
Top 4 Code Viewers for Data Scientist in VSCode πŸ”—
Incorporate the Best Practices for Python with These Top 4 VSCode Extensions πŸ”—
Boost Your Efficiency with Customized Code Snippets on VSCode πŸ”—
Top 9 Keyboard Shortcuts in VSCode for Data Scientists πŸ”—

Book Review

Title Article Repository
Python Machine Learning: A Comprehensive Handbook for Machine Learning πŸ”—

Data Science Portfolio

Title Article Repository
How to Create an Elegant Website for your Data Science Portfolio in 10 minutes πŸ”—
Build an Impressive Github Profile in 3 Steps πŸ”—

Supporters

Special thanks to these supporters for supporting this project!

More Repositories

1

Efficient_Python_tricks_and_tools_for_data_scientists

Efficient Python Tricks and Tools for Data Scientists
Jupyter Notebook
1,391
star
2

data-science-template

Template for a data science project
Python
668
star
3

awesome-Python-data-science-books

Probably the best curated list of data science books in Python
386
star
4

machine-learning-articles

List of interesting articles on different topics of machine learning and deep learning
HTML
163
star
5

reproducible-data-science

Tutorials on creating a reproducible and maintainable data science project
Jupyter Notebook
130
star
6

rich-dataframe

Create animated and pretty Pandas Dataframe
Python
117
star
7

cicd-mlops-demo

Demo for CI/CD in a machine learning project
Python
90
star
8

Machine-learning-pipeline

Example machine learning pipeline with MLflow and Hydra
Python
85
star
9

top-github-scraper

Scape top GitHub repositories and users based on keywords
HTML
77
star
10

Python-data-science-code-snippet

Useful data science and Python code snippets at Data Science Simplified
Jupyter Notebook
66
star
11

khuyentran1401

49
star
12

prefect2-mlops-demo

Demo on how to use Prefect 2 in an ML project
Python
40
star
13

employee-future-prediction

Demo for Using GitHub Actions in MLOps
Jupyter Notebook
39
star
14

prefect-mlops-recipes

Tutorials/use cases of using Prefect in an ML project.
39
star
15

hydra-demo

Python
31
star
16

analyze_github_feed

Create a local dashboard to visualize and filter your GitHub feed
Python
29
star
17

prefect-docker

Demo on how to use Prefect with Docker
Python
26
star
18

prefect-dvc

Python
24
star
19

python_snippet

Python and data science snippets on the command line
Python
21
star
20

Task-scheduler-problem

Jupyter Notebook
21
star
21

hydra_demo

Demo of Hydra
Python
18
star
22

detect-data-drift-pipeline

A pipeline to detect data drift and retrain the model when there is drift
Python
18
star
23

dog_classifier

A simple app to classify dogs using fastai and streamlit.
Jupyter Notebook
17
star
24

same-stats-different-graphs

Create datasets with different graphs but the same statistics
Python
17
star
25

kedro_demo

A demo of a data science project using Kedro
Python
16
star
26

Numerical-Optimization-Machine-learning

Codes for popular numerical optimization methods and machine learning algorithms
Jupyter Notebook
13
star
27

Applied-Integer-Programming-with-Python

Jupyter Notebook
13
star
28

prefect-alert

A decorator that sends alert when a Prefect flow fails
Python
12
star
29

linear-programming-with-PuLP

Jupyter Notebook
12
star
30

Voronoi-diagram

Implementation of voronoi diagram with incremental algorithm
Jupyter Notebook
12
star
31

iris-prefect

Python
11
star
32

aboutKhuyen

Website showing some of my accomplishments
9
star
33

kdtree-implementation

Python
8
star
34

prefect-course

Python
6
star
35

Data-science-videos

Videos for Data Science Simplified YouTube Channel
Jupyter Notebook
6
star
36

Extract-text-from-article

Jupyter Notebook
6
star
37

dagshub-demo

Demo of DagsHub
Jupyter Notebook
6
star
38

strip_interactive

Strip and execute interactive Python string in a Python script
Python
6
star
39

Web-Scrapping-Wikipedia

Jupyter Notebook
5
star
40

Speed-Dating

Explore the factors that make a yes/no in a fast dating setup
Python
5
star
41

dbt-demo

Demo for dbt
5
star
42

dataset

4
star
43

Cython

HTML
4
star
44

atoti_project

An example data science project using atoti
Python
4
star
45

Author-Profiling

Jupyter Notebook
3
star
46

animated_bar_chart

Jupyter Notebook
3
star
47

google_trend

Jupyter Notebook
3
star
48

deploy_atoti

Python
3
star
49

test-gpt-commit

Python
3
star
50

refactor_function

5 Steps to Transform Messy Functions into Production-Ready Code
Python
3
star
51

Suicide-rates

Jupyter Notebook
2
star
52

MNIST-gradient-descent

Implementation of gradient descent from scratch with binary cross entrophy loss
Jupyter Notebook
2
star
53

nyc_property_sales

Jupyter Notebook
2
star
54

creative-developer-blog

SCSS
2
star
55

Web-scrape-Ghibli-Movie-Database

Jupyter Notebook
2
star
56

github_analysis

Analyze top users on Github
2
star
57

world-population-prediction

Simple prediction of world population using Linear Regression
Jupyter Notebook
2
star
58

code_image_to_text

Python
2
star
59

my-ds

Python
2
star
60

talk_demos

Collections of code for meetups and conferences
Python
2
star
61

dbt-mage

Python
2
star
62

EPS-Y

Jupyter Notebook
2
star
63

Game-of-Thrones-And-Graph

https://towardsdatascience.com/how-to-visualize-social-network-with-graph-theory-4b2dc0c8a99f
Jupyter Notebook
2
star
64

visualize_github

Jupyter Notebook
1
star
65

Flask

Python
1
star
66

mlops-kestra-workflow

Demo on an automated model training workflows triggered by S3
Python
1
star
67

Computational-Geometry

Jupyter Notebook
1
star
68

Web-scrapping

In this repository, I use Beautiful Soup to extract data from websites to gain insights for further analysis
Jupyter Notebook
1
star
69

KNN-and-Bayes-Classifier

The implementation and comparison of Optimal Bayes with symmetric loss and KNN Classifier
Python
1
star
70

Time-Python-Objects

Jupyter Notebook
1
star
71

analyze-happiness-report

Jupyter Notebook
1
star
72

predict-heart-disease

Jupyter Notebook
1
star
73

Non-negative-least-squares

Jupyter Notebook
1
star
74

Recursion-examples

Jupyter Notebook
1
star
75

voyce

Python
1
star
76

pretty-text

A package to create pretty text in 1 command line
Python
1
star
77

Union-algorithms

Jupyter Notebook
1
star
78

Ghibli-scrape-analysis

Jupyter Notebook
1
star
79

python-project

Makefile
1
star
80

article-analysis

Jupyter Notebook
1
star
81

zenml_example

An end-to-end project using ZenML
1
star
82

customed-nitpick

Customized configurations for nitpick
1
star
83

pandas-processors

Python
1
star
84

Sample_datapane_script

This repo shows how to use Datapane create a simple script to see the rank of the authors or publications with respect to publishing frequency
Jupyter Notebook
1
star
85

gapminder

Jupyter Notebook
1
star