• Stars
    star
    129
  • Rank 279,262 (Top 6 %)
  • Language
    Python
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

๐Ÿ›  Python project template with unit tests, code coverage, linting, type checking, Makefile wrapper, and GitHub Actions.

Tests codecov

python-collab-template

Repository for How to set up a Python Repo for Automation and Collaboration.

Quickstart

# Clone this repo and change directory
git clone [email protected]:eugeneyan/python-collab-template.git
cd python-collab-template

# Built docker image
make setup

# Alternatively, you can create a python virtualenv on your local machine (-B might be needed to execute)
make setup-venv -B

# Run the suite of tests and checks on docker
make run-checks

# Alternatively, run the suite of tests and checks on your local machine
make checks

Make a pull request to this repo to see the checks in action ๐Ÿ˜Ž

Here's a sample pull request which initially failed โŒ the checks, and then passed โœ….

Running our checks

In it, we cover the following aspects of setting up a python project, including:

Unit Tests

@pytest.fixture
def lowercased_df():
    string_col = ['futrelle, mme. jacques heath (lily may peel)',
                  'backstrom, major. karl alfred (maria mathilda gustafsson)']
    df_dict = {'string': string_col}
    df = pd.DataFrame(df_dict)
    return df

def test_extract_title(lowercased_df):
    result = extract_title(lowercased_df, col='string')
    assert result['title'].tolist() == ['mme', 'ms', 'mr', 'lady', 'major']


def test_extract_title_with_replacement(lowercased_df):
    title_replacement = {'mme': 'mrs', 'ms': 'miss', 'lady': 'rare', 'major': 'rare'}
    result = extract_title(lowercased_df, col='string', replace_dict=title_replacement)
    assert result['title'].tolist() == ['mrs', 'miss', 'mr', 'rare', 'rare']
$ pytest
============================= test session starts ==============================
platform darwin -- Python 3.8.2, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
rootdir: /Users/eugene/projects/python-collaboration-template/tests/data_prep
collected 2 items

test_categorical.py::test_extract_title PASSED                           [ 50%]
test_categorical.py::test_extract_title_with_replacement PASSED          [100%]

============================== 2 passed in 0.30s ===============================

Code Coverage

$ pytest --cov=src
============================= test session starts ==============================
platform darwin -- Python 3.8.2, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
rootdir: /Users/eugene/projects/python-collaboration-template
plugins: cov-2.10.0
collected 9 items

tests/data_prep/test_categorical.py ....                                 [ 44%]
tests/data_prep/test_continuous.py .....                                 [100%]

---------- coverage: platform darwin, python 3.8.2-final-0 -----------
Name                           Stmts   Miss  Cover
--------------------------------------------------
src/__init__.py                    0      0   100%
src/data_prep/__init__.py          0      0   100%
src/data_prep/categorical.py      12      0   100%
src/data_prep/continuous.py       11      0   100%
--------------------------------------------------
TOTAL                             23      0   100%

============================== 9 passed in 0.49s ===============================

Linting

$ pylint src.data_prep.categorical --reports=y
************* Module src.data_prep.categorical
src/data_prep/categorical.py:20:0: C0330: Wrong continued indentation (add 9 spaces).
                        df[title_col].map(replace_dict),
                        ^        | (bad-continuation)
src/data_prep/categorical.py:21:0: C0330: Wrong continued indentation (add 9 spaces).
                        df[title_col])
                        ^        | (bad-continuation)
src/data_prep/categorical.py:16:12: W1401: Anomalous backslash in string: '\.'. String constant might be missing an r prefix. (anomalous-backslash-in-string)
src/data_prep/categorical.py:1:0: C0114: Missing module docstring (missing-module-docstring)
src/data_prep/categorical.py:5:0: C0116: Missing function or method docstring (missing-function-docstring)
src/data_prep/categorical.py:9:0: C0116: Missing function or method docstring (missing-function-docstring)
src/data_prep/categorical.py:14:0: C0116: Missing function or method docstring (missing-function-docstring)

Report
======
12 statements analysed.

...

Messages
--------
+------------------------------+------------+
|message id                    |occurrences |
+==============================+============+
|missing-function-docstring    |3           |
+------------------------------+------------+
|bad-continuation              |2           |
+------------------------------+------------+
|missing-module-docstring      |1           |
+------------------------------+------------+
|anomalous-backslash-in-string |1           |
+------------------------------+------------+

-----------------------------------
Your code has been rated at 4.17/10

Type Checking

$ mypy src
src/data_prep/continuous.py:23: error: Incompatible types in assignment (expression has type "str", variable has type "float")
Found 1 error in 1 file (checked 4 source files)
$ mypy src
Success: no issues found in 4 source files

Wrapping it in a Makefile

clean-pyc:
	find . -name '*.pyc' -exec rm -f {} +
	find . -name '*.pyo' -exec rm -f {} +
	find . -name '*~' -exec rm -f {} +
	find . -name '__pycache__' -exec rm -fr {} +

clean-test:
	rm -f .coverage
	rm -f .coverage.*

clean: clean-pyc clean-test

test: clean
	. .venv/bin/activate && py.test tests --cov=src --cov-report=term-missing --cov-fail-under 95

GitHub Actions with each git push

# .github/workflows/tests.yml
name: Tests
on: push
jobs:
  tests:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - uses: actions/setup-python@v1
      with:
        python-version: 3.8
        architecture: x64
    - run: make setup
    - run: make check
    - run: bash <(curl -s https://codecov.io/bash)

๐Ÿ‘‰ View the article for the walkthrough.

Todo

  • Update requirements.txt to use poetry

More Repositories

1

applied-ml

๐Ÿ“š Papers & tech blogs by companies sharing their work on data science & machine learning in production.
24,324
star
2

open-llms

๐Ÿ“‹ A list of open LLMs available for commercial use.
10,867
star
3

ml-surveys

๐Ÿ“‹ Survey papers summarizing advances in deep learning, NLP, CV, graphs, reinforcement learning, recommendations, graphs, etc.
2,630
star
4

ml-design-docs

๐Ÿ“ Design doc template & examples for machine learning systems (requirements, methodology, implementation, etc.)
395
star
5

1-on-1s

๐ŸŒฑ 1-on-1 questions and resources from my time as a manager.
310
star
6

testing-ml

๐Ÿ” Minimal examples of machine learning tests for implementation, behaviour, and performance.
Python
199
star
7

obsidian-copilot

๐Ÿค– A prototype assistant for writing and thinking
Python
186
star
8

applyingml

๐Ÿ“Œ Papers, guides, and mentor interviews on applying machine learning for ApplyingML.comโ€”the ghost knowledge of machine learning.
JavaScript
160
star
9

papermill-mlflow

๐Ÿงช Simple data science experimentation & tracking with jupyter, papermill, and mlflow.
Jupyter Notebook
152
star
10

recsys-nlp-graph

๐Ÿ›’ Simple recommender with matrix factorization, graph, and NLP. Beating the regular collaborative filtering baseline.
Python
112
star
11

llm-paper-notes

Notes from the Latent Space paper club. Follow along or start your own!
73
star
12

fastapi-html

Sample repository demonstrating how to use FastAPI to serve HTML web apps.
Python
62
star
13

eugeneyan

Python
38
star
14

poc-docker-template

Simple template showing how to set up docker for reproducible data science with Jupyter notebooks.
Jupyter Notebook
21
star
15

text-to-image

Jupyter Notebook
13
star
16

nocode-ml

๐Ÿ˜ End-to-end machine learning; "no code" required!
12
star
17

discord-llm

Experimenting with LLMs to Research, Reflect, and Plan (LLM assistants, retrieval, and Discord integration)
Jupyter Notebook
11
star
18

learning-typescript

JavaScript
10
star
19

design-patterns

Java
7
star
20

deep-rl

Repository for deep reinforcement learning with OpenAI
Python
6
star
21

testing-pipelines

Python
6
star
22

kaggle_springleaf

Code for Kaggle Springleaf Email Prediction Challenge
Python
5
star
23

Computational-Thinking-and-Data-Science

edX: Introduction to Computational Thinking and Data Science (Oct 2014)
Python
5
star
24

ama

Ask Me Anything
4
star
25

Mining-Massive-Datasets

Coursera: Mining Massive Datasets (Sep 2014)
R
4
star
26

Time-Series-Analysis

Simple forecasting with Regression Model
R
3
star
27

raspberry-llm

Calling LLM APIs on a Raspberry Pi for lulz
Python
3
star
28

Statistical-Inference

This repository contains the lab assignments for the facilitation of John Hopkins University' Coursera MOOC on Statistical Inference.
R
3
star
29

kaggle_titanic

Code for Kaggle Titanic Challenge (and other learning)
HTML
3
star
30

Statistical-Learning

Stanford OpenX: Introduction to Statistical Learning
HTML
3
star
31

Data-Analysis-and-Statistical-Inference-Project

Coursera: Data Analysis & Statistical Inference Project (Feb 2014)
R
2
star
32

neural_networks_and_deep_learning

2
star
33

Twitter-SMA

Twitter Streaming and Analysis with Python and R
R
2
star
34

scratch

Jupyter Notebook
2
star
35

Getting-and-Cleaning-Data

Coursera: Getting and Cleaning Data (May 2014)
R
2
star
36

Computer-Science-and-Programming-In-Python

edX: Introduction to Computer Science and Programming in Python (July 2014)
Python
1
star
37

Misc

R
1
star
38

datagene

Jupyter Notebook
1
star
39

Interactive-Programming-in-Python

Coursera: Interactive Programming in Python (Apr 2014)
Python
1
star
40

R-Programming

Coursera: R Programming (May 2014)
R
1
star
41

Visualizations

Random Visualizations
R
1
star
42

json-to-utterances

Jupyter Notebook
1
star
43

DKSG-HOME

Sharing my R script used in the DKSG DataLearn for home
R
1
star
44

eugeneyan-comments

1
star
45

kaggle_otto

Code for Kaggle Otto Production Classification Challenge
R
1
star
46

Demand-Forecasting

Prototyping various forecasting techniques
R
1
star
47

Machine-Learning

Coursera: Machine Learning (Aug 2014)
MATLAB
1
star