• Stars
    star
    345
  • Rank 122,750 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 8 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Data Assimilation with Python: a Package for Experimental Research

DAPPER is a set of templates for benchmarking the performance of data assimilation (DA) methods. The tests provide experimental support and guidance for new developments in DA. The typical set-up is a synthetic (twin) experiment, where you specify a dynamic model and an observational model, and use these to generate a synthetic truth (multivariate time series), and then estimate that truth given the models and noisy observations.

Github CI Coveralls pre-commit PyPI - Version PyPI - Downloads

Getting started

Install, then read, run and try to understand examples/basic_{1,2,3}.py. Some of the examples can also be opened in Jupyter, and thereby run in the cloud (i.e. without installation, but requiring Google login): Open In Collab. This screencast provides an introduction. The documentation includes general guidelines and the API, but for any serious use you will want to read and adapt the code yourself. If you use it in a publication, please cite, e.g., The experiments used (inspiration from) DAPPER [ref], version 1.2.1, where [ref] points to DOI. Lastly, for an introduction to DA theory also using Python, see these tutorials.

Highlights

DAPPER enables the numerical investigation of DA methods through a variety of typical test cases and statistics. It (a) reproduces numerical benchmarks results reported in the literature, and (b) facilitates comparative studies, thus promoting the (a) reliability and (b) relevance of the results. For example, this figure is generated by examples/basic_3.py, making use of built-in tools for experiment and result management, reproduces figure 5.7 of these lecture notes.

Comparative benchmarks with Lorenz'96 plotted as a function of the ensemble size (N)

DAPPER is (c) open source, written in Python, and (d) focuses on readability; this promotes the (c) reproduction and (d) dissemination of the underlying science, and makes it easy to adapt and extend.

It also illustrates how to parallelise ensemble forecasts (e.g. the QG model), local analyses (e.g. the LETKF), and independent experiments (e.g. examples/basic_3.py). It comes with a battery of diagnostics and statistics. These all get averaged over subdomains (e..g "ocean" and "land") and then in time. Confidence intervals are computed, including correction for auto-correlations, and used for uncertainty quantification, and significant digits printing. Several diagnostics are included in the on-line "liveplotting" illustrated below, which may be paused for further interactive inspection.

EnKF - Lorenz'63

In summary, DAPPER is well suited for teaching and fundamental DA research. Also see its drawbacks.

Installation

Successfully tested on Linux/Mac/Windows.

Prerequisite: Python>=3.9

If you're an expert, setup a python environment however you like. Otherwise: Install Anaconda, then open the Anaconda terminal and run the following commands:

conda create --yes --name dapper-env python=3.9
conda activate dapper-env
python --version

Ensure the printed version is 3.9 or more.
Keep using the same terminal for the commands below.

Install

Either: Install for development (recommended)

Do you want the DAPPER code available to play around with? Then

  • Download and unzip (or git clone) DAPPER.
  • Move the resulting folder wherever you like,
    and cd into it (ensure you're in the folder with a setup.py file).
  • pip install -e '.[dev]'
    You can omit [dev] if you don't need to do serious development.

Or: Install as library

Do you just want to run a script that requires DAPPER? Then

  • If the script comes with a requirements.txt file, then do
    pip install -r path/to/requirements.txt.
  • If not, hopefully you know the version of DAPPER needed. Run
    pip install dapper==1.5.1 to get version 1.5.1 (as an example).

Finally: Test the installation

You should now be able to do run your script with python path/to/script.py.
For example, if you are in the DAPPER dir,

python examples/basic_1.py

PS: If you closed the terminal (or shut down your computer), you'll first need to run conda activate dapper-env

DA methods

Method Literature reproduced
EnKF 1 Sakov08, Hoteit15, Grudzien2020
EnKF-N Bocquet12, Bocquet15
EnKS, EnRTS Raanes2016
iEnKS / iEnKF / EnRML / ES-MDA 2 Sakov12, Bocquet12, Bocquet14
LETKF, local & serial EAKF Bocquet11
Sqrt. model noise methods Raanes2014
Particle filter (bootstrap) 3 Bocquet10
Optimal/implicit Particle filter 3 Bocquet10
NETF Tödter15, Wiljes16
Rank histogram filter (RHF) Anderson10
4D-Var
3D-Var
Extended KF
Optimal interpolation
Climatology

1: Stochastic, DEnKF (i.e. half-update), ETKF (i.e. sym. sqrt.). Serial forms are also available.
Tuned with inflation and "random, orthogonal rotations".
2: Also supports the bundle version, and "EnKF-N"-type inflation.
3: Resampling: multinomial (including systematic/universal and residual).
The particle filter is tuned with "effective-N monitoring", "regularization/jittering" strength, and more.

For a list of ready-made experiments with suitable, tuned settings for a given method (e.g. the iEnKS), use:

grep -r "xp.*iEnKS" dapper/mods

Test cases (models)

Model Lin TLM** PDE? Phys.dim. State len Lyap≥0 Implementer
Id Yes Yes No N/A * 0 Raanes
Linear Advect. (LA) Yes Yes Yes 1d 1000 * 51 Evensen/Raanes
DoublePendulum No Yes No 0d 4 2 Matplotlib/Raanes
Ikeda No Yes No 0d 2 1 Raanes
LotkaVolterra No Yes No 0d 5 * 1 Wikipedia/Raanes
Lorenz63 No Yes "Yes" 0d 3 2 Sakov
Lorenz84 No Yes No 0d 3 2 Raanes
Lorenz96 No Yes No 1d 40 * 13 Raanes
Lorenz96s No Yes No 1d 10 * 4 Grudzien
LorenzUV No Yes No 2x 1d 256 + 8 * ≈60 Raanes
LorenzIII No No No 1d 960 * ≈164 Raanes
Vissio-Lucarini 20 No Yes No 1d 36 * 10 Yumeng
Kuramoto-Sivashinsky No Yes Yes 1d 128 * 11 Kassam/Raanes
Quasi-Geost (QG) No No Yes 2d 129²≈17k ≈140 Sakov
  • *: Flexible; set as necessary
  • **: Tangent Linear Model included?

The models are found as subdirectories within dapper/mods. A model should be defined in a file named __init__.py, and illustrated by a file named demo.py. Most other files within a model subdirectory are usually named authorYEAR.py and define a HMM object, which holds the settings of a specific twin experiment, using that model, as detailed in the corresponding author/year's paper. A list of these files can be obtained using

find dapper/mods -iname '[a-z]*[0-9]*.py'

Some files contain settings used by several papers. Moreover, at the bottom of each such file should be (in comments) a list of suitable, tuned settings for various DA methods, along with their expected, average rmse.a score for that experiment. As mentioned above, DAPPER reproduces literature results. You will also find results that were not reproduced by DAPPER.

Similar projects

DAPPER is aimed at research and teaching (see discussion up top). Example of limitations:

  • It is not suited for very big models (>60k unknowns).
  • Time-dependent length of state vector (but this can be emulated).
  • Non-uniform time sequences (TODO).

The scope of DAPPER is restricted because

framework_to_language

Moreover, even straying beyond basic configurability appears unrewarding when already building on a high-level language such as Python. Indeed, you may freely fork and modify the code of DAPPER, which should be seen as a set of templates, and not a framework.

Also, DAPPER comes with no guarantees/support. Therefore, if you have an operational or real-world application, such as WRF, you should look into one of the alternatives, sorted by approximate project size.

Name Developers Purpose (approximately)
DART NCAR General
PDAF AWI General
JEDI JCSDA (NOAA, NASA, ++) General
OpenDA TU Delft General
EMPIRE Reading (Met) General
ERT Statoil History matching (Petroleum DA)
PIPT CIPR History matching (Petroleum DA)
MIKE DHI Oceanographic
OAK Liège Oceanographic
Siroco OMP Oceanographic
Verdandi INRIA Biophysical DA
PyOSSE Edinburgh, Reading Earth-observation DA

Below is a list of projects with a purpose more similar to DAPPER's (research in DA, and not so much using DA):

Name Developers Notes
DAPPER Raanes, Chen, Grudzien Python
SANGOMA Conglomerate* Fortran, Matlab
hIPPYlib Villa, Petra, Ghattas Python, adjoint-based PDE methods
FilterPy R. Labbe Python. Engineering oriented.
DASoftware Yue Li, Stanford Matlab. Large inverse probs.
Pomp U of Michigan R
EnKF-Matlab Sakov Matlab
EnKF-C Sakov C. Light-weight, off-line DA
pyda Hickman Python
PyDA Shady-Ahmed Python
DasPy Xujun Han Python
DataAssim.jl Alexander-Barth Julia
DataAssimilationBenchmarks.jl Grudzien Julia, Python
EnsembleKalmanProcesses.jl Clim. Modl. Alliance Julia, EKI (optim)
Datum Raanes Matlab
IEnKS code Bocquet Python

The EnKF-Matlab and IEnKS codes have been inspirational in the development of DAPPER.

*: AWI/Liege/CNRS/NERSC/Reading/Delft

Contributors

Patrick N. Raanes, Yumeng Chen, Colin Grudzien, Maxime Tondeur, Remy Dubois

DAPPER is developed and maintained at NORCE (Norwegian Research Institute) and the Nansen Environmental and Remote Sensing Center (NERSC), in collaboration with the University of Reading, the UK National Centre for Earth Observation (NCEO), and the Center for Western Weather and Water Extremes (CW3E).

NORCE NERSC

Publication list

More Repositories

1

nansat

Scientist friendly Python toolbox for processing 2D satellite Earth observation data.
Python
171
star
2

DA-tutorials

Tutorials on data assimilation (DA) and the EnKF
Python
137
star
3

sea_ice_drift

Sea ice drift from Sentinel-1 SAR imagery using open source feature tracking
Python
37
star
4

openwind

A python package for estimating high resolution wind from SAR images
Python
35
star
5

sentinel1denoised

Thermal noise subtraction, scalloping correction, angular correction
Jupyter Notebook
34
star
6

django-geo-spaas

GeoDjango apps for satellite data management in Geo-Scientific Platform as a Service
Python
20
star
7

s1_icetype_cnn

Retrieve sea ice type from Sentinel-1 SAR with CNN
Jupyter Notebook
20
star
8

nansat-lectures

Tutorial material on the Nansat and Nansen-Cloud systems
Jupyter Notebook
19
star
9

NERSC-HYCOM-CICE

Source code and utilities for the NERSC version of HYCOM+CICE
Fortran
8
star
10

sea_ice_type_cnn_training

Deep learning of satellite data: Use the data from satellites for machine learning (deep learning) purposes
Jupyter Notebook
5
star
11

NEDAS

NERSC Ensemble Data Assimilation System
Jupyter Notebook
5
star
12

CommonBasisFunction

Python module for calculating Common Basis Function from NetCDF files.
Python
4
star
13

django-geo-spaas-sar-doppler

Django Geo-SPaaS application for SAR Doppler shift processing
Python
3
star
14

nersc_ml_course

internal ML course/practical demonstration intern to NERSC
Jupyter Notebook
3
star
15

python-streamlet

Draw nicely floating streamlets
Python
2
star
16

AR_Tracking

Atmospheric tracking algorithm for ERA5 based on Brands 2017 and Lavers 2012.
Python
2
star
17

django-geo-spaas-noaa-ndbc

Django Geo-SPaaS application for accessing data from NOAA National Data Buoy Center
Python
2
star
18

django-geo-spaas-gnssr

Django Geo-SPaaS application for GNSS reflectrometry data
Python
2
star
19

sea_ice_drift_test_files

Test files for sea_ice_drift repository
2
star
20

zoning

Zoning of aquatic area based on objective analysis of time series of satellite observations
Python
2
star
21

SID-NN

Sea-ice Damage using NN
Jupyter Notebook
2
star
22

py-thesaurus-interface

An interface to metadata conventions for geospatial data
Python
2
star
23

diag.gnssr

An ocean surface height and wind speed intercomparison
Julia
2
star
24

Forecasting-harmful-algae-blooms--application-to-Dinophysis-acuminata-in-northern-Norway

Jupyter Notebook
2
star
25

ml-crashcourse

Jupyter Notebook
2
star
26

enkf-topaz

Fortran
2
star
27

nersc-metadata

Metadata conventions for geospatial data at NERSC
Python
1
star
28

glitter

Derive wave spectra from the optical remote sensing data
Jupyter Notebook
1
star
29

GreenSeasADC-portlet

GreenSeas Analytical Database Client a Liferay portal implemented in java and javascript. It enables a user to query a database for different parameters stored.
JavaScript
1
star
30

django-geo-spaas-harvesting

Harvest data into a GeoSPaaS catalog
Python
1
star
31

msda_crashcourse

Crash Course session on multiscale data assimilation
Jupyter Notebook
1
star
32

django-geo-spaas-rest-api

REST API for django-geo-spaas
Python
1
star
33

WIM2d

2d waves-in-ice module
MATLAB
1
star
34

MOIRA

Ridged ice detection from SAR data
Python
1
star
35

boreali

Bio-Optical REtrieval ALgorithm' for calculation of water quality parameters concentrations from satellite data
C
1
star
36

sar_image_warping

Efficient algorithm fow warping SAR imagery with motion compensation
1
star
37

capardus

This is the repo for the CAPARDUS project.
1
star
38

django-geo-spaas-processing

Processing tools for GeoSPaaS
Python
1
star
39

nansatmap

Basemap extension for easy mapping with Nansat
Python
1
star
40

crash-course-ML-slides

Jupyter Notebook
1
star
41

nansen-cloud

GeoDjango Apps for satellite data management
Python
1
star
42

django-geo-spaas-svp-drifters

Django Geo-SPaaS application for surface Lagrangian drifters from the Surface Velocity Program
Python
1
star