Data Science Cookie Cutter for Prefect
Why Should You Use This Template?
This template is the result of my years refining the best way to structure a data science project so that it is reproducible and maintainable.
This template allows you to:
Tools used in this project
- Poetry: Dependency management - article
- Prefect: Orchestrate and observe your data pipeline - article
- Pydantic: Data validation using Python type annotations - article
- pre-commit plugins: Automate code reviewing formatting - article
- Makefile: Create short and readable commands for repeatable tasks - article
- GitHub Actions: Automate your workflows, making it faster to build, test, and deploy your code - article
- pdoc: Automatically create an API documentation for your project
Project structure
.
βββ data
β βββ final # data after training the model
β βββ processed # data after processing
β βββ raw # raw data
βββ docs # documentation for your project
βββ .flake8 # configuration for flake8 - a Python formatter tool
βββ .gitignore # ignore files that cannot commit to Git
βββ Makefile # store useful commands to set up the environment
βββ models # store models
βββ notebooks # store notebooks
βββ .pre-commit-config.yaml # configurations for pre-commit
βββ pyproject.toml # dependencies for poetry
βββ README.md # describe your project
βββ src # store source code
β βββ __init__.py # make src a Python module
β βββ config.py # store configs
β βββ process.py # process data before training model
β βββ run_notebook.py # run notebook
β βββ train_model.py # train model
βββ tests # store tests
βββ __init__.py # make tests a Python module
βββ test_process.py # test functions for process.py
βββ test_train_model.py # test functions for train_model.py
How to use this project
Install Cookiecutter:
pip install cookiecutter
Create a project based on the template:
cookiecutter https://github.com/khuyentran1401/data-science-template