• Stars
    star
    118
  • Rank 299,923 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Implementation of Hang et al. 2020 "Hyperspectral Image Classification with Attention Aided CNNs" for tree species prediction

DeepTreeAttention

Github Actions

Tree Species Prediction for the National Ecological Observatory Network (NEON)

Implementation of Hang et al. 2020 Hyperspectral Image Classification with Attention Aided CNNs for tree species prediction.

Model Architecture

Project Organization

├── LICENSE
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── environment.yml   <- Conda requirements
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── Models         <- Model Architectures

Workflow

There are three main parts to this project, a 1) data module, a 2) model module, and 3) a trainer module. Usually the data_module is created to hold the train and test split and keep track of data generation reproducibility. Then a model architecture is created and pass to the model module along with the data module. Finally the model module is passed to the trainer.

#1) 
data_module = data.TreeData(csv_file="data/raw/neon_vst_data_2021.csv", regenerate=False, client=client)

#2)
model = <create a pytorch NN.module>
m = main.TreeModel(model=model, bands=data_module.config["bands"], classes=data_module.num_classes,label_dict=data_module.species_label_dict)

#3
trainer = Trainer()
trainer.fit(m, datamodule=data_module)

Pytorch Lightning Data Module (data.TreeData)

This repo contains a pytorch lightning data module for reproducibility. The goal of the project is to make it easy to share with others within our research group, but we welcome contributions from outside the community. While all data is public, it is VERY large (>20TB) and cannot be easily shared. If you want to reproduce this work, you will need to download the majority of NEON's camera, HSI and CHM data and change the paths in the config file. For the 'raw' NEON tree stem data see data/raw/neon_vst_2021.csv. The data module starts from this state, which are x,y locations for each tree. It then performs the following actions as an end-to-end workflow.

  1. Filters the data to represent trees over 3m with sufficient number of training samples
  2. Extract the LiDAR derived canopy height and compares it to the field measured height. Trees that are below the canopy are excluded based on the min_CHM_diff parameter in the config.
  3. Splits the training and test x,y data such that field plots are either in training or test.
  4. For each x,y stem location the crown is predicted by the tree detection algorithm (DeepForest - https://deepforest.readthedocs.io/).
  5. Crops of each tree crown are created and divided into pixel windows for pixel-level prediction.

This workflow does not need to be run on every experiment. If you are satisifed with the current train/test split and data generation process, set regenerate=False

data_module = data.TreeData(csv_file="data/raw/neon_vst_data_2021.csv", regenerate=False)
data_module.setup()

Pytorch Lightning Training Module (data.TreeModel)

Training is handled by the TreeModel class which loads a model from the models folder, reads the config file and runs the training. The evaluation metrics and images are computed and put of the comet dashboard

m = main.TreeModel(model=Hang2020.vanilla_CNN, bands=data_module.config["bands"], classes=data_module.num_classes,label_dict=data_module.species_label_dict)

trainer = Trainer(
    gpus=data_module.config["gpus"],
    fast_dev_run=data_module.config["fast_dev_run"],
    max_epochs=data_module.config["epochs"],
    accelerator=data_module.config["accelerator"],
    logger=comet_logger)
   
trainer.fit(m, datamodule=data_module)

Alive/Dead Filtering

As part of the prediction pipeline, RGB crops are scored as either 'Alive', meanining they have leaves during presumed leaf-on season, or 'Dead', meaning they do not have leaves. To finetune the resent50 model, see src/models/dead.py. The classified data for the Alive/Dead crops can be found in data/raw/dead_train and dead/raw/dead_test.

Dev Guide

In general, major changes or improvements should be made on a new git branch. Only core improvements should be made on the main branch. If a change leads to higher scores, please create a pull request. Any pull requests are expected to have pytest unit tests (see tests/) that cover major use cases.

Model Architectures

The TreeModel class takes in a create model function

m = main.TreeModel(model=Hang2020.vanilla_CNN)

Any model can be specified provided it follows the following input and output arguments

class myModel(Module):
    """
    Model description
    """
    def __init__(self, bands, classes):
        super(myModel, self).__init__()
        <define model architecture here>

    def forward(self, x):
        <forward method for computing loss goes here>
        class_scores = F.softmax(x)
        
        return class_scores

Extending the model

To create a model that takes in new inputs, I strongly recommend sub-classing the existing TreeData and TreeModel classes. For an example, see the MetadataModel in models/metadata.py

#Subclass of the training model
class MetadataModel(main.TreeModel):
    """Subclass the core model and update the training loop to take two inputs"""
    def __init__(self, model, sites,classes, label_dict, config):
        super(MetadataModel,self).__init__(model=model,classes=classes,label_dict=label_dict, config=config)  
    
    def training_step(self, batch, batch_idx):
        """Train on a loaded dataset
        """
        #allow for empty data if data augmentation is generated
        inputs, y = batch
        images = inputs["HSI"]
        metadata = inputs["site"]
        y_hat = self.model.forward(images, metadata)
        loss = F.cross_entropy(y_hat, y)    
        
        return loss

Getting Started (UF - collaboration)

This section is meant solely for members of the idtrees group who have access to the data.

  1. Fork this repo and install the conda environment.
conda env create -f=environment.yml
conda activate DeepTreeAttention
  1. Update the config.yml

Currently, only members of the ewhite group have permissions to the raw NEON data.

For example:

rgb_sensor_pool: /orange/ewhite/NeonData/*/DP3.30010.001/**/Camera/**/*.tif

This is not a problem, just set

regenerate: False

and it will bypass these steps and use the existing train/test split (e.g. data/processed/train.csv)

You will need to set the correct crop directories

crop_dir: /blue/ewhite/b.weinstein/DeepTreeAttention/crops/

To wherever the crops are saved. This is currently

/orange/idtrees-collab/DeepTreeAttention/crops/

I highly recommend making a comet login. Change

#Comet dashboard
comet_workspace: bw4sz

to your usename and add a .comet.config file to authenticate.

  1. Submit a job

Submit a SLURM job

sbatch SLURM/experiment.sh
  1. Look at the comet repo for results

The metrics tab has the Micro and Macro Accuracy.

More Repositories

1

DeepForest

Python Package for Airborne RGB machine learning
Python
512
star
2

retriever

Quickly download, clean up, and install public datasets into a database management system
Python
307
star
3

NeonTreeEvaluation

Benchmark dataset for tree detection for airborne RGB, Hyperspectral and LIDAR imagery
Python
131
star
4

ogrants

Open grants list
R
129
star
5

DeepLidar

LIDAR and RGB Deep Learning Model for Individual Tree Segmentation
Python
58
star
6

data-sharing-paper

Paper on data sharing in ecology for IEE special issue
Shell
46
star
7

PortalData

Official Repo of the Portal Project Data
R
46
star
8

LDATS

Latent Dirichlet Allocation coupled with Bayesian Time Series analyses
R
25
star
9

NeonTreeEvaluation_package

R package for evaluating individual tree crown predictions against a diverse benchmark dataset
R
22
star
10

forecasting-course

Morgan Ernest & Ethan White's Ecological Forecasting & Dynamics Course
HTML
20
star
11

DeepForest-pytorch

Pytorch implementation of the deepforest model for tree crown RGB detection.
JavaScript
17
star
12

livedat

Template for living data workflow
R
15
star
13

portal-teachingdb

A simplified version of the Portal Project database designed for teaching
Python
14
star
14

MATSS

R Package for Macroecological Analysis of Time Series Structure (MATSS)
R
14
star
15

white-etal-2012-ecology

Code for replicating "Characterizing species-abundance distributions across taxa and ecosystems using a simple maximum entropy model" by Ethan P. White, Katherine M. Thibault, and Xiao Xiao
Python
13
star
16

portalr

A collection of functions to summarize the Portal Data
R
11
star
17

BirdDetector

A global bird detector model for the DeepForest-pytorch package
Jupyter Notebook
10
star
18

METE

Software for fitting and testing John Harte and colleagues' Maximum Entropy Theory of Ecology.
Python
10
star
19

deepforestr

TeX
10
star
20

macroecotools

Tools for Macroecological Analyses Using Python
Python
9
star
21

pydataweaver

The Pydata Weaver: A data Integration platform
Python
9
star
22

portalPredictions

Using various models to forecast abundances at Portal
9
star
23

portalcasting

Model development, deployment, and evaluation for forecasting Portal ecosystem dynamics
R
8
star
24

TreeSegmentation

Reproducible analysis of tree segmentation methods in R.
HTML
8
star
25

sad-comparison

Elita Baldridge's dissertation repository
Python
8
star
26

forecasting-dynamics-course

Morgan Ernest and Ethan White's course on Ecological Dynamics and Forecasting
HTML
8
star
27

MillionTrees

The MillionTreesBenchmark
Python
8
star
28

abundance

zero-inflated negative binomial neural network with variational approximation to site-level & observer-level random effects
Jupyter Notebook
7
star
29

bibliometrics

Bibliometric research
Python
7
star
30

bbs-forecasting

Research on forecasting using Breeding Bird Survey data
Jupyter Notebook
7
star
31

retrieverdash

Dashboard for retriever dataset status
Python
6
star
32

MATSS-forecasting

Forecasting Analysis Comparison for Ecological Time Series
R
5
star
33

Retriever.jl

Julia wrapper for the Data Retriever software
Julia
5
star
34

mete-spatial

Research on the spatial predictions of METE
R
5
star
35

retriever-recipes

Python
5
star
36

website

Lab website
TeX
4
star
37

livedat-github-actions

R
4
star
38

experimental-rads

R
3
star
39

MATSS-LDATS

Macroecological LDA analysis of time series
R
3
star
40

NeonTreeEvaluation_python

python benchmark for individual tree crown methods
Python
3
star
41

DeepForest_demos

DEPRECATED DeepForest demos
Jupyter Notebook
3
star
42

ratdat

R package version of Portal Project Teaching Database
R
3
star
43

branch-arch

HTML
3
star
44

neonwranglerpy

Python
3
star
45

EvergladesWadingBird

Data from the Everglades breeding bird monitoring project
R
3
star
46

ecology-data-cache

2
star
47

lab-manual

Shell
2
star
48

mvgamportal

R
2
star
49

DeepForest_Model

Companion repo to DeepForest for training large models from DeepForest python package.
Python
2
star
50

MATSSdemo

Automated build of a MATSS compendium
HTML
2
star
51

forecast_evaluation

Methods and tools for evaluating probabilistic forecasts
R
2
star
52

feasiblesets

Research code for understanding distributions of wealth in the context of all possible distributions
C
2
star
53

DoubleCounting

A repo for visualizing airborne imagery, making a prediction with a deepforest model and remove double counting among overlapping images.
Python
2
star
54

mete-geog

Research on making geographic predictions for macroecological patterns using the Maximum Entropy Theory of Ecology
R
2
star
55

mcdb

Mammal community database updates
1
star
56

diversity-conservation

Python
1
star
57

datawiki

Efforts related to http://ecologicaldata.org
PHP
1
star
58

portal-tools

Code for working with the Portal Project data
Python
1
star
59

NEON_crown_maps

Generating tree crown maps for NEON sites
HTML
1
star
60

retriever-website

Website for the EcoData Retriever
CSS
1
star
61

retriever-site

http://www.data-retriever.org
CSS
1
star
62

portal-exploration

Exploration of the Portal data, using portalr
R
1
star
63

EvergladesSpeciesModel

A deepforest model for wading bird species prediction.
Python
1
star
64

DeepTreeScape

Joint RGB and Deep Learning for individual tree detection
Makefile
1
star
65

mete-energy

Research on the energetic predictions of METE
Python
1
star
66

ESA_workshop_2022

lessons taught at the ESA workshop 2022 titled
HTML
1
star
67

NeonTreeEvaluation_analysis

This is the companion repo that presents analysis for methods tested on the NeonTreeEvaluation benchmark repo
Python
1
star
68

NeonSpeciesBenchmark

A tree species classification benchmark for the National Ecological Observatory Network
Python
1
star
69

NDVIning

sifting and comparing multiple NDVI sources and ensembles
R
1
star
70

forecasts

Archived forecasts for future analysis
1
star
71

portal-explorer

Shiny App for exploratory data analysis of the Portal Project
R
1
star
72

portal-experimental-macroeco

Code for reproducing the results from "An experimental test of the response of macroecological patterns to altered species interactions" by Sarah Supp et al. 2013 publishe in Ecology
1
star
73

bbc-data-rescue

Extracting data from Breeding Bird Census publications
Python
1
star
74

portal-rodent-dispersal

Research on rodent dispersal patterns near Portal AZ
R
1
star
75

MATSS-startup

Macroecological Analysis of Time Series Structure
R
1
star
76

TreeDemo

Shiny app for tree detection in RGB imagery
Python
1
star
77

wader

A collection of functions to retrieve and summarize the EvergladesWadingBird Data
R
1
star
78

AirborneBirds

The Airborne Birds benchmark
1
star
79

wiki

Weecology's Wiki - how to navigate life in the lab from choosing and pursuing a career path to using the printer
HTML
1
star