• Stars
    star
    380
  • Rank 112,100 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 6 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Open solution to the Mapping Challenge 🌎

Open Solution to the Mapping Challenge Competition

Gitter license

Note

Unfortunately, we can no longer provide support for this repo. Hopefully, it should still work, but if it doesn't, we cannot really help.

More competitions πŸŽ‡

Check collection of public projects 🎁, where you can find multiple Kaggle competitions with code, experiments and outputs.

Poster 🌍

Poster that summarizes our project is available here.

Intro

Open solution to the CrowdAI Mapping Challenge competition.

  1. Check live preview of our work on public projects page: Mapping Challenge πŸ“ˆ.
  2. Source code and issues are publicly available.

Results

0.943 Average Precision πŸš€

0.954 Average Recall πŸš€

No cherry-picking here, I promise πŸ˜‰. The results exceded our expectations. The output from the network is so good that not a lot of morphological shenanigans is needed. Happy days:)

Average Precision and Average Recall were calculated on stage 1 data using pycocotools. Check this blog post for average precision explanation.

Disclaimer

In this open source solution you will find references to the neptune.ai. It is free platform for community Users, which we use daily to keep track of our experiments. Please note that using neptune.ai is not necessary to proceed with this solution. You may run it as plain Python script πŸ˜‰.

Reproduce it!

Check REPRODUCE_RESULTS

Solution write-up

Pipeline diagram

Preprocessing

βœ”οΈ What Worked

  • Overlay binary masks for each image is produced (code πŸ’»).
  • Distances to the two closest objects are calculated creating the distance map that is used for weighing (code πŸ’»).
  • Size masks for each image is produced (code πŸ’»).
  • Dropped small masks on the edges (code πŸ’»).
  • We load training and validation data in batches: using torch.utils.data.Dataset and torch.utils.data.DataLoader makes it easy and clean (code πŸ’»).
  • Only some basic augmentations (due to speed constraints) from the imgaug package are applied to images (code πŸ’»).
  • Image is resized before feeding it to the network. Surprisingly this worked better than cropping (code πŸ’» and config πŸ“‘).

βœ–οΈ What didn't Work

  • Ground truth masks are prepared by first eroding them per mask creating non overlapping masks and only after that the distances are calculated (code πŸ’»).
  • Dilated small objects to increase the signal (code πŸ’»).
  • Network is fed with random crops (code πŸ’» and config πŸ“‘).

πŸ€” What could have worked but we haven't tried it

Network

βœ”οΈ What Worked

  • Unet with Resnet34, Resnet101 and Resnet152 as an encoder where Resnet101 gave us the best results. This approach is explained in the TernausNetV2 paper (our code πŸ’» and config πŸ“‘). Also take a look at our parametrizable implementation of the U-Net.

βœ–οΈ What didn't Work

  • Network architecture based on dilated convolutions described in this paper.

πŸ€” What could have worked but we haven't tried it

  • Unet with contextual blocks explained in this paper.

Loss function

βœ”οΈ What Worked

  • Distance weighted cross entropy explained in the famous U-Net paper (our code πŸ’» and config πŸ“‘).
  • Using linear combination of soft dice and distance weighted cross entropy (code πŸ’» and config πŸ“‘).
  • Adding component weighted by building size (smaller buildings has greater weight) to the weighted cross entropy that penalizes misclassification on pixels belonging to the small objects (code πŸ’»).

Weights visualization

For both weights: the darker the color the higher value.

  • distance weights: high values corresponds to pixels between buildings.
  • size weights: high values denotes small buildings (the smaller the building the darker the color). Note that no-building is fixed to black.

Training

βœ”οΈ What Worked

  • Use pretrained models!
  • Our multistage training procedure:
    1. train on a 50000 examples subset of the dataset with lr=0.0001 and dice_weight=0.5
    2. train on a full dataset with lr=0.0001 and dice_weight=0.5
    3. train with smaller lr=0.00001 and dice_weight=0.5
    4. increase dice weight to dice_weight=5.0 to make results smoother
  • Multi-GPU training
  • Use very simple augmentations

The entire configuration can be tweaked from the config file πŸ“‘.

πŸ€” What could have worked but we haven't tried it

  • Set different learning rates to different layers.
  • Use cyclic optimizers.
  • Use warm start optimizers.

Postprocessing

βœ”οΈ What Worked

  • Test time augmentation (tta). Make predictions on image rotations (90-180-270 degrees) and flips (up-down, left-right) and take geometric mean on the predictions (code πŸ’» and config πŸ“‘).
  • Simple morphological operations. At the beginning we used erosion followed by labeling and per label dilation with structure elements chosed by cross-validation. As the models got better, erosion was removed and very small dilation was the only one showing improvements (code πŸ’»).
  • Scoring objects. In the beginning we simply used score 1.0 for every object which was a huge mistake. Changing that to average probability over the object region improved results. What improved scores even more was weighing those probabilities with the object size (code πŸ’»).
  • Second level model. We tried Light-GBM and Random Forest trained on U-Net outputs and features calculated during postprocessing.

βœ–οΈ What didn't Work

  • Test time augmentations by using colors (config πŸ“‘).
  • Inference on reflection-padded images was not a way to go. What worked better (but not for the very best models) was replication padding where border pixel value was replicated for all the padded regions (code πŸ’»).
  • Conditional Random Fields. It was so slow that we didn't check it for the best models (code πŸ’»).

πŸ€” What could have worked but we haven't tried it

  • Ensembling
  • Recurrent neural networks for postprocessing (instead of our current approach)

Model Weights

Model weights for the winning solution are available here

You can use those weights and run the pipeline as explained in REPRODUCE_RESULTS.

User support

There are several ways to seek help:

  1. crowdai discussion.
  2. You can submit an issue directly in this repo.
  3. Join us on Gitter.

Contributing

  1. Check CONTRIBUTING for more information.
  2. Check issues to check if there is something you would like to contribute to.

More Repositories

1

neptune-client

πŸ“˜ The experiment tracker for foundation model training
Python
572
star
2

open-solution-salt-identification

Open solution to the TGS Salt Identification Challenge
Python
120
star
3

examples

πŸ“ Examples of how to use Neptune for different use cases and with various MLOps tools
Jupyter Notebook
76
star
4

blog-binary-classification-metrics

Codebase for the blog post "24 Evaluation Metrics for Binary Classification (And When to Use Them)"
Jupyter Notebook
55
star
5

neptune-notebooks

πŸ“š Jupyter Notebooks extension for versioning, managing and sharing notebook checkpoints in your machine learning and data science projects.
Python
34
star
6

neptune-mlflow

Neptune - MLflow integration 🧩 Experiment tracking with advanced UI, collaborative features, and user access management.
Python
31
star
7

neptune-contrib

This library is a location of the LegacyLogger for PyTorch Lightning.
Python
27
star
8

neptune-examples

Examples of using Neptune to keep track of your experiments (maintenance only).
Jupyter Notebook
26
star
9

kedro-neptune

πŸ“Œ Track & manage metadata, visualize & compare Kedro pipelines in a nice UI.
Python
18
star
10

neptune-r

πŸ“’ The MLOps stack component for experiment tracking (R interface)
R
14
star
11

neptune-tensorboard

Neptune - TensorBoard integration 🧩 Experiment tracking with advanced UI, collaborative features, and user access management.
Python
13
star
12

neptune-optuna

πŸš€ Optuna visualization dashboard that lets you log and monitor hyperparameter sweep live.
Python
11
star
13

blog-hyperparameter_optimization

Codebase for the series of blog posts on Medium
Python
8
star
14

model-fairness-in-practice

Materials for the ODSC West 2019 workshop "Model fairness in practice"
Jupyter Notebook
8
star
15

neptune-sklearn

Experiment tracking for scikit-learn. 🧩 Log, organize, visualize and compare model metrics, parameters, dataset versions, and more.
Python
6
star
16

neptune-tensorflow-keras

πŸ’‘ Experiment tracking for TensorFlow/Keras. Log, organize, and compare model metrics, learning curves, hyperparameters, dataset versions, and more.
Python
6
star
17

neptune-xgboost

Experiment tracking for XGBoost. 🧩 Log, organize, visualize and compare machine learning model metrics, parameters, dataset versions, and more.
Python
6
star
18

project-time-series-forecasting

Experiment tracking and model registry in the time series forecasting project
Python
5
star
19

neptune-lib

Project is deprecated. Please go to neptune-client.
Python
5
star
20

neptune-sacred

Sacred-compatible UI for experiment tracking. πŸ“Š Log and visualize machine learning model metrics, hyperparameters, code, dataset versions, and more.
Python
5
star
21

neptune-fastai

Experiment tracking for fastai. 🧩 Log, organize, visualize and compare model metrics, hyperparameters, dataset versions, and more.
Python
4
star
22

docs

Neptune documentation
Shell
4
star
23

kaggle-ieee-fraud-detection

Example of a project with experiment management
Jupyter Notebook
4
star
24

neptune-action

Continuous Integration with GitHub Actions and Neptune
Python
3
star
25

neptune-prophet

Experiment tracking for Prophet. 🧩 Log, organize, visualize and compare model parameters, forecasts, and more.
Python
3
star
26

neptune-lightgbm

Experiment tracking for LightGBM. 🧩 Log, organize, visualize and compare model metrics, parameters, dataset versions, and more.
Python
3
star
27

neptune-pytorch-lightning

PyTorch Lightning logger for experiment tracking. 🧩 Monitor model training live, track metrics & hyperparameters, visualize models, and more.
Python
3
star
28

neptune-client-experimental

Python
3
star
29

neptune-airflow

Python
2
star
30

project-images-segmentation

Experiment tracking and model registry in the images segmentation project
Jupyter Notebook
2
star
31

neptune-client-scale

Python
2
star
32

neptune-fetcher

Python
2
star
33

workshops

code for Neptune workshops
Jupyter Notebook
1
star
34

tour-pytorch

Example project with PyTorch and Neptune.
Python
1
star
35

example-project-code

Code used in the "example-project" in neptune.
Python
1
star
36

neptune-client-e2e

Python
1
star
37

automation-pipelines

Python
1
star
38

tour-scikit-learn

Example project with scikit-learn and neptune.
Python
1
star
39

neptune

Python
1
star
40

neptune-integration-template

Python
1
star
41

tour-tf-keras

Source code for project with tour-tf-keras.
Python
1
star
42

examples-r

R
1
star
43

neptune-aws

Python
1
star
44

project-nlp

Experiment tracking and model registry in the NLP project
Jupyter Notebook
1
star
45

neptune-detectron2

Experiment tracking for Detectron2. 🧩 Log, organize, visualize, and compare model metrics, hyperparameters, dataset versions, and more.
Python
1
star
46

neptune-api

Python
1
star