• Stars
    star
    683
  • Rank 66,158 (Top 2 %)
  • Language
    Python
  • Created over 2 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Template for a data science project

View Article

Data Science Cookie Cutter for Prefect

Why Should You Use This Template?

This template is the result of my years refining the best way to structure a data science project so that it is reproducible and maintainable.

This template allows you to:

βœ… Create a readable structure for your project

βœ… Automatically run tests when committing your code

βœ… Enforce type hints at runtime

βœ… Check issues in your code before committing

βœ… Efficiently manage the dependencies in your project

βœ… Create short and readable commands for repeatable tasks

βœ… Rerun only modified components of a pipeline

βœ… Automatically document your code

βœ… Observe and automate your code

Tools used in this project

Project structure

.
β”œβ”€β”€ data            
β”‚   β”œβ”€β”€ final                       # data after training the model
β”‚   β”œβ”€β”€ processed                   # data after processing
β”‚   β”œβ”€β”€ raw                         # raw data
β”œβ”€β”€ docs                            # documentation for your project
β”œβ”€β”€ .flake8                         # configuration for flake8 - a Python formatter tool
β”œβ”€β”€ .gitignore                      # ignore files that cannot commit to Git
β”œβ”€β”€ Makefile                        # store useful commands to set up the environment
β”œβ”€β”€ models                          # store models
β”œβ”€β”€ notebooks                       # store notebooks
β”œβ”€β”€ .pre-commit-config.yaml         # configurations for pre-commit
β”œβ”€β”€ pyproject.toml                  # dependencies for poetry
β”œβ”€β”€ README.md                       # describe your project
β”œβ”€β”€ src                             # store source code
β”‚   β”œβ”€β”€ __init__.py                 # make src a Python module
β”‚   β”œβ”€β”€ config.py                   # store configs 
β”‚   β”œβ”€β”€ process.py                  # process data before training model
β”‚   β”œβ”€β”€ run_notebook.py             # run notebook
β”‚   └── train_model.py              # train model
└── tests                           # store tests
    β”œβ”€β”€ __init__.py                 # make tests a Python module 
    β”œβ”€β”€ test_process.py             # test functions for process.py
    └── test_train_model.py         # test functions for train_model.py

How to use this project

Install Cookiecutter:

pip install cookiecutter

Create a project based on the template:

cookiecutter https://github.com/khuyentran1401/data-science-template

Resources

More Repositories

1

Data-science

Collection of useful data science topics along with articles, videos, and code
Jupyter Notebook
4,031
star
2

Efficient_Python_tricks_and_tools_for_data_scientists

Efficient Python Tricks and Tools for Data Scientists
Jupyter Notebook
1,422
star
3

awesome-Python-data-science-books

Probably the best curated list of data science books in Python
393
star
4

machine-learning-articles

List of interesting articles on different topics of machine learning and deep learning
HTML
163
star
5

reproducible-data-science

Tutorials on creating a reproducible and maintainable data science project
Jupyter Notebook
136
star
6

rich-dataframe

Create animated and pretty Pandas Dataframe
Python
117
star
7

cicd-mlops-demo

Demo for CI/CD in a machine learning project
Python
91
star
8

Machine-learning-pipeline

Example machine learning pipeline with MLflow and Hydra
Python
86
star
9

top-github-scraper

Scape top GitHub repositories and users based on keywords
HTML
79
star
10

Python-data-science-code-snippet

Useful data science and Python code snippets at Data Science Simplified
Jupyter Notebook
67
star
11

khuyentran1401

49
star
12

prefect2-mlops-demo

Demo on how to use Prefect 2 in an ML project
Python
40
star
13

employee-future-prediction

Demo for Using GitHub Actions in MLOps
Jupyter Notebook
40
star
14

prefect-mlops-recipes

Tutorials/use cases of using Prefect in an ML project.
39
star
15

hydra-demo

Python
31
star
16

analyze_github_feed

Create a local dashboard to visualize and filter your GitHub feed
Python
29
star
17

prefect-docker

Demo on how to use Prefect with Docker
Python
26
star
18

prefect-dvc

Python
24
star
19

python_snippet

Python and data science snippets on the command line
Python
21
star
20

Task-scheduler-problem

Jupyter Notebook
21
star
21

detect-data-drift-pipeline

A pipeline to detect data drift and retrain the model when there is drift
Python
20
star
22

hydra_demo

Demo of Hydra
Python
18
star
23

dog_classifier

A simple app to classify dogs using fastai and streamlit.
Jupyter Notebook
17
star
24

same-stats-different-graphs

Create datasets with different graphs but the same statistics
Python
17
star
25

kedro_demo

A demo of a data science project using Kedro
Python
16
star
26

Numerical-Optimization-Machine-learning

Codes for popular numerical optimization methods and machine learning algorithms
Jupyter Notebook
13
star
27

Applied-Integer-Programming-with-Python

Jupyter Notebook
13
star
28

prefect-alert

A decorator that sends alert when a Prefect flow fails
Python
12
star
29

linear-programming-with-PuLP

Jupyter Notebook
12
star
30

Voronoi-diagram

Implementation of voronoi diagram with incremental algorithm
Jupyter Notebook
12
star
31

iris-prefect

Python
11
star
32

aboutKhuyen

Website showing some of my accomplishments
9
star
33

kdtree-implementation

Python
8
star
34

prefect-course

Python
6
star
35

Data-science-videos

Videos for Data Science Simplified YouTube Channel
Jupyter Notebook
6
star
36

Extract-text-from-article

Jupyter Notebook
6
star
37

dagshub-demo

Demo of DagsHub
Jupyter Notebook
6
star
38

strip_interactive

Strip and execute interactive Python string in a Python script
Python
6
star
39

Web-Scrapping-Wikipedia

Jupyter Notebook
5
star
40

Speed-Dating

Explore the factors that make a yes/no in a fast dating setup
Python
5
star
41

dbt-demo

Demo for dbt
5
star
42

dataset

4
star
43

Cython

HTML
4
star
44

atoti_project

An example data science project using atoti
Python
4
star
45

Author-Profiling

Jupyter Notebook
3
star
46

animated_bar_chart

Jupyter Notebook
3
star
47

google_trend

Jupyter Notebook
3
star
48

deploy_atoti

Python
3
star
49

test-gpt-commit

Python
3
star
50

refactor_function

5 Steps to Transform Messy Functions into Production-Ready Code
Python
3
star
51

Suicide-rates

Jupyter Notebook
2
star
52

MNIST-gradient-descent

Implementation of gradient descent from scratch with binary cross entrophy loss
Jupyter Notebook
2
star
53

nyc_property_sales

Jupyter Notebook
2
star
54

creative-developer-blog

SCSS
2
star
55

github_analysis

Analyze top users on Github
2
star
56

world-population-prediction

Simple prediction of world population using Linear Regression
Jupyter Notebook
2
star
57

code_image_to_text

Python
2
star
58

Web-scrape-Ghibli-Movie-Database

Jupyter Notebook
2
star
59

my-ds

Python
2
star
60

talk_demos

Collections of code for meetups and conferences
Python
2
star
61

dbt-mage

Python
2
star
62

EPS-Y

Jupyter Notebook
2
star
63

Game-of-Thrones-And-Graph

https://towardsdatascience.com/how-to-visualize-social-network-with-graph-theory-4b2dc0c8a99f
Jupyter Notebook
2
star
64

visualize_github

Jupyter Notebook
1
star
65

pandas-processors

Python
1
star
66

mlops-kestra-workflow

Demo on an automated model training workflows triggered by S3
Python
1
star
67

Computational-Geometry

Jupyter Notebook
1
star
68

KNN-and-Bayes-Classifier

The implementation and comparison of Optimal Bayes with symmetric loss and KNN Classifier
Python
1
star
69

Time-Python-Objects

Jupyter Notebook
1
star
70

analyze-happiness-report

Jupyter Notebook
1
star
71

predict-heart-disease

Jupyter Notebook
1
star
72

Non-negative-least-squares

Jupyter Notebook
1
star
73

Recursion-examples

Jupyter Notebook
1
star
74

voyce

Python
1
star
75

Web-scrapping

In this repository, I use Beautiful Soup to extract data from websites to gain insights for further analysis
Jupyter Notebook
1
star
76

Union-algorithms

Jupyter Notebook
1
star
77

pretty-text

A package to create pretty text in 1 command line
Python
1
star
78

Ghibli-scrape-analysis

Jupyter Notebook
1
star
79

Flask

Python
1
star
80

python-project

Makefile
1
star
81

article-analysis

Jupyter Notebook
1
star
82

zenml_example

An end-to-end project using ZenML
1
star
83

customed-nitpick

Customized configurations for nitpick
1
star
84

Sample_datapane_script

This repo shows how to use Datapane create a simple script to see the rank of the authors or publications with respect to publishing frequency
Jupyter Notebook
1
star
85

gapminder

Jupyter Notebook
1
star