Data Science Cookie Cutter for Prefect

Why Should You Use This Template?

This template is the result of my years refining the best way to structure a data science project so that it is reproducible and maintainable.

This template allows you to:

✅ Create a readable structure for your project

✅ Automatically run tests when committing your code

✅ Enforce type hints at runtime

✅ Check issues in your code before committing

✅ Efficiently manage the dependencies in your project

✅ Create short and readable commands for repeatable tasks

✅ Rerun only modified components of a pipeline

✅ Automatically document your code

✅ Observe and automate your code

Tools used in this project

Poetry: Dependency management - article
Prefect: Orchestrate and observe your data pipeline - article
Pydantic: Data validation using Python type annotations - article
pre-commit plugins: Automate code reviewing formatting - article
Makefile: Create short and readable commands for repeatable tasks - article
GitHub Actions: Automate your workflows, making it faster to build, test, and deploy your code - article
pdoc: Automatically create an API documentation for your project

Project structure

.
├── data            
│   ├── final                       # data after training the model
│   ├── processed                   # data after processing
│   ├── raw                         # raw data
├── docs                            # documentation for your project
├── .flake8                         # configuration for flake8 - a Python formatter tool
├── .gitignore                      # ignore files that cannot commit to Git
├── Makefile                        # store useful commands to set up the environment
├── models                          # store models
├── notebooks                       # store notebooks
├── .pre-commit-config.yaml         # configurations for pre-commit
├── pyproject.toml                  # dependencies for poetry
├── README.md                       # describe your project
├── src                             # store source code
│   ├── __init__.py                 # make src a Python module
│   ├── config.py                   # store configs 
│   ├── process.py                  # process data before training model
│   ├── run_notebook.py             # run notebook
│   └── train_model.py              # train model
└── tests                           # store tests
    ├── __init__.py                 # make tests a Python module 
    ├── test_process.py             # test functions for process.py
    └── test_train_model.py         # test functions for train_model.py

How to use this project

Install Cookiecutter:

pip install cookiecutter

Create a project based on the template:

cookiecutter https://github.com/khuyentran1401/data-science-template

khuyentran1401/data-science-template

khuyentran1401

Reviews

Repository Details

Data Science Cookie Cutter for Prefect

Why Should You Use This Template?

Tools used in this project

Project structure

How to use this project

Resources

More Repositories