• Stars
    star
    684
  • Rank 66,068 (Top 2 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 5 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A benchmark dataset for data-driven weather forecasting

Logo

WeatherBench: A benchmark dataset for data-driven weather forecasting

Binder

If you are using this dataset please cite

Stephan Rasp, Peter D. Dueben, Sebastian Scher, Jonathan A. Weyn, Soukayna Mouatadid, and Nils Thuerey, 2020. WeatherBench: A benchmark dataset for data-driven weather forecasting. arXiv: https://arxiv.org/abs/2002.00469

This repository contains all the code for downloding and processing the data as well as code for the baseline models in the paper.


Note! The data has been changed from the original release. Here is a list of changes:

  • New vertical levels. Used to be [1, 10, 100, 200, 300, 400, 500, 600, 700, 850, 1000], now is [50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000]. This is to be compatible with CMIP output. The new levels include all of the old ones with the exception of [1, 10].
  • CMIP data. Regridded CMIP data of some variables was added. This is the historical simulation of the MPI-ESM-HR model.

If you have any questions about this dataset, please use the Github Issue feature on this page!

Leaderboard

Model Z500 RMSE (3 / 5 days) [m2/s2] T850 RMSE (3 / 5 days) [K] Notes Reference
Operational IFS 154 / 334 1.36 / 2.03 ECWMF physical model (10 km) Rasp et al. 2020
Rasp and Thuerey 2020 (direct/continuous) 268 / 499 1.65 / 2.41 Resnet with CMIP pretraining (5.625 deg) Rasp and Thuerey 2020
IFS T63 268 / 463 1.85 / 2.52 Lower resolution physical model (approx. 1.9 deg) Rasp et al. 2020
Weyn et al. 2020 (iterative) 373 / 611 1.98 / 2.87 UNet with cube-sphere mapping (2 deg) Weyn et al. 2020
Clare et al. 2021 (direct) 375 / 627 2.11 / 2.91 Stacked ResNets with probabilistic output (5.625 deg) Clare et al. 2021
IFS T42 489 / 743 3.09 / 3.83 Lower resolution physical model (approx. 2.8 deg) Rasp et al. 2020
Weekly climatology 816 3.50 Climatology for each calendar week Rasp et al. 2020
Persistence 936 / 1033 4.23 / 4.56 Rasp et al. 2020
Climatology 1075 5.51 Rasp et al. 2020

Quick start

You can follow the quickstart guide in this notebook or lauch it directly from Binder.

Download the data

The data is hosted here with the following directory structure

.
|-- 1.40625deg
|   |-- 10m_u_component_of_wind
|   |-- 10m_v_component_of_wind
|   |-- 2m_temperature
|   |-- constants
|   |-- geopotential
|   |-- old
|   |   `-- temperature
|   |-- potential_vorticity
|   |-- relative_humidity
|   |-- specific_humidity
|   |-- temperature
|   |-- toa_incident_solar_radiation
|   |-- total_cloud_cover
|   |-- total_precipitation
|   |-- u_component_of_wind
|   |-- v_component_of_wind
|   `-- vorticity
|-- 2.8125deg
|   |-- 10m_u_component_of_wind
|   |-- 10m_v_component_of_wind
|   |-- 2m_temperature
|   |-- constants
|   |-- geopotential
|   |-- potential_vorticity
|   |-- relative_humidity
|   |-- specific_humidity
|   |-- temperature
|   |-- toa_incident_solar_radiation
|   |-- total_cloud_cover
|   |-- total_precipitation
|   |-- u_component_of_wind
|   |-- v_component_of_wind
|   `-- vorticity
|-- 5.625deg
|   |-- 10m_u_component_of_wind
|   |-- 10m_v_component_of_wind
|   |-- 2m_temperature
|   |-- constants
|   |-- geopotential
|   |-- geopotential_500
|   |-- potential_vorticity
|   |-- relative_humidity
|   |-- specific_humidity
|   |-- temperature
|   |-- temperature_850
|   |-- toa_incident_solar_radiation
|   |-- total_cloud_cover
|   |-- total_precipitation
|   |-- u_component_of_wind
|   |-- v_component_of_wind
|   `-- vorticity
|-- baselines
|   `-- saved_models
|-- CMIP
|   `-- MPI-ESM
|       |-- 2.8125deg
|       |   |-- geopotential
|       |   |-- specific_humidity
|       |   |-- temperature
|       |   |-- u_component_of_wind
|       |   `-- v_component_of_wind
|       `-- 5.625deg
|           |-- geopotential
|           |-- specific_humidity
|           |-- temperature
|           |-- u_component_of_wind
|           `-- v_component_of_wind
|-- IFS_T42
|   `-- raw
|-- IFS_T63
|   `-- raw
`-- tigge
    |-- 1.40625deg
    |   |-- geopotential_500
    |   `-- temperature_850
    |-- 2.8125deg
    |   |-- geopotential_500
    |   `-- temperature_850
    `-- 5.625deg
        |-- 2m_temperature
        |-- geopotential_500
        |-- temperature_850
        `-- total_precipitation

To start out download either the entire 5.625 degree data (175G) using

wget "https://dataserv.ub.tum.de/s/m1524895/download?path=%2F5.625deg&files=all_5.625deg.zip" -O all_5.625deg.zip

or simply the single level (500 hPa) geopotential data using

wget "https://dataserv.ub.tum.de/s/m1524895/download?path=%2F5.625deg%2Fgeopotential_500&files=geopotential_500_5.625deg.zip" -O geopotential_500_5.625deg.zip

and then unzip the files using unzip <file>.zip. You can also use ftp or rsync to download the data. For instructions, follow the download link.

Baselines and evaluation

IMPORTANT: The format of the predictions file is a NetCDF dataset with dimensions [init_time, lead_time, lat, lon]. Consult the notebooks for examples. You are stongly encouraged to format your predictions in the same way and then use the same evaluation functions to ensure consistent evaluation.

Baselines

The baselines are created using Jupyter notebooks in notebooks/. In all notebooks, the forecasts are saved as a NetCDF file in the predictions directory of the dataset.

CNN baselines

An example of how to load the data and train a CNN using Keras is given in notebooks/3-cnn-example.ipynb. In addition a command line script for training CNNs is provided in src/train_nn.py. For the baseline CNNs in the paper the config files are given in src/nn_configs/. To reproduce the results in the paper run e.g. python -m src.train_nn -c src/nn_configs/fccnn_3d.yml.

Evaluation

Evaluation and comparison of the different baselines in done in notebooks/4-evaluation.ipynb. The scoring is done using the functions in src/score.py. The RMSE values for the baseline models are also saved in the predictions directory of the dataset. This is useful for plotting your own models alongside the baselines.

Data processing

The dataset already contains the most important processed data. If you would like to download a different variable , regrid to a different resolution or extract single levels from the 3D files, here is how to do that!

Downloading and processing the raw data from the ERA5 archive

The workflow to get to the processed data that ended up in the data repository above is:

  1. Download monthly files from the ERA5 archive (src/download.py)
  2. Regrid the raw data to the required resolutions (src/regrid.py)

The raw data is from the ERA5 reanalysis archive. Information on how to download the data can be found here and here.

Because downloading the data can take a long time (several weeks), the workflow is encoded using Snakemake. See Snakefile and the configuration files for each variable in scripts/config_ {variable}.yml. These files can be modified if additional variables are required. To execute Snakemake for a particular variable type : snakemake -p -j 4 all --configfile scripts/config_toa_incident_solar_radiation.yml.

In addition to the time-dependent fields, the constant fields were downloaded and processed using scripts /download_and_regrid_constants.sh

Downloading the TIGGE IFS baseline

To obtain the operational IFS baseline, we use the TIGGE Archive. Downloading the data for Z500 and T850 is done in scripts/download_tigge.py; regridding is done in scripts /convert_and_regrid_tigge.sh.

Regridding the T21 IFS baseline

The T21 baseline was created by Peter Dueben. The raw output can be found in the dataset. To regrid the data scripts /convert_and_regrid_IFS_TXX.sh was used.

Downloading and regridding CMIP historical climate model data.

To download historical climate model data use the Snakemake file in snakemake_configs_CMIP. Here, we downloaded data from the MIP-ESM-HR model. To download other models, search for the download links on the CMIP website and modify the scripts accordingly.

Extracting single levels from 3D files

If you would like to extract a single level from 3D data, e.g. 850 hPa temperature, you can use src /extract_level.py. This could be useful to reduce the amount of data that needs to be loaded into RAM. An example usage would be: python extract_level.py --input_fns DATADIR/5.625deg/temperature/*.nc --output_dir OUTDIR --level 850

More Repositories

1

pangeo

Pangeo website + discussion of general issues related to the project.
Jupyter Notebook
698
star
2

awesome-open-climate-science

Awesome Open Atmospheric, Ocean, and Climate Science
529
star
3

climpred

🌎 Verification of weather and climate forecasts 🌍
Python
231
star
4

xESMF

Universal Regridder for Geospatial Data
Python
188
star
5

scikit-downscale

Statistical climate downscaling in Python
Python
185
star
6

rechunker

Disk-to-disk chunk transformation for chunked arrays.
Jupyter Notebook
163
star
7

pangeo-docker-images

Docker Images For Pangeo Jupyter Environment
Dockerfile
127
star
8

pangeo-example-notebooks

Pangeo Example Notebooks
Jupyter Notebook
104
star
9

pangeo-tutorial

Interactive jupyter notebooks for pangeo tutorial events
Jupyter Notebook
89
star
10

cog-best-practices

Best practices with cloud-optimized-geotiffs (COGs)
Jupyter Notebook
77
star
11

pangeo-cloud-federation

Deployment automation for Pangeo JupyterHubs on AWS, Google, and Azure
JavaScript
58
star
12

pangeo-cmip6-examples

Examples of analysis of CMIP6 data using xarray and dask
Jupyter Notebook
55
star
13

mldata

ML Datasets Catalog
Python
54
star
14

pangeo-datastore

Pangeo Cloud Datastore
Python
48
star
15

education-material

An organizational meta-repo with pointers to all of the myriad educational materials available today (in any form)
32
star
16

pangeo-tutorial-sea-2018

Pangeo Tutorial for 2018 NCAR SEA Conference
Jupyter Notebook
31
star
17

jupyter-earth

Jupyter meets the Earth: combining research use cases in geosciences with technical developments within the Jupyter and Pangeo ecosystems.
Dockerfile
28
star
18

xcmocean

xarray accessor for automating choosing colormaps, aimed at geosciences
Python
22
star
19

ml-workflow-examples

Simple examples of data pipelines from xarray to ML training
Jupyter Notebook
22
star
20

pangeo-data.github.io

JavaScript
22
star
21

helm-chart

Pangeo helm charts
Shell
21
star
22

pangeo-ocean-examples

Examples of analysis of ocean data and simulation outputs using xarray, xgcm, and pangeo.
Jupyter Notebook
21
star
23

terraform-deploy

deployment of pangeo jupyterhub infrastructure with terraform
HCL
19
star
24

pangeo-binder

Pangeo + Binder (dev repo for a binder/pangeo fusion concept)
Python
18
star
25

pangeo-cmip6-cloud

Documentation for Pangeo CMIP6 data stored in GCP/AWS cloud
Python
17
star
26

pangeo-stacks

Curated Docker images for use with Jupyter and Pangeo
Python
17
star
27

pangeo-julia-examples

Working with pangeo cloud-based data with Julia
Jupyter Notebook
16
star
28

escience-2022

eScience 2022 course on Tools in Climate Science: Linking Observations with Modelling
Jupyter Notebook
13
star
29

llc4320_pangeo

Python codes reading and processing LLC4320 model
Jupyter Notebook
13
star
30

landsat-8-tutorial-gallery

Gallery repo for the pangeo-tutorial landsat-8 notebook on Pangeo Gallery http://gallery.pangeo.io/index.html
Jupyter Notebook
13
star
31

storage-benchmarks

testing performance of different storage layers
Jupyter Notebook
12
star
32

benchmarking

Benchmarking & Scaling Studies of the Pangeo Platform
Jupyter Notebook
12
star
33

distributed-array-examples

12
star
34

pangeo-era5

scripts and tools for ingesting ERA5 into cloud storage
Jupyter Notebook
11
star
35

pangeo-openeo-BiDS-2023

Pangeo & OpenEO Joint tutorial for BiDS23 - "Scaling Big Data Analysis with Pangeo and OpenEO: Unlocking the Power of Space Data"
Jupyter Notebook
10
star
36

pangeo-tutorial-gallery

Repo to house pangeo-tutorial notebooks for pangeo-gallery
Jupyter Notebook
10
star
37

zarr-proxy

A proxy for Zarr stores that allows for chunking overrides.
Python
9
star
38

swot_adac_ogcms

Documentation and notebooks for the SWOT Adopt-a-Crossover Model Intercomparison
Jupyter Notebook
9
star
39

pangeo-tools

Pangeo Tools RISE Slideshow
Jupyter Notebook
7
star
40

cmr

convergence pangeo + NASA CMR + NASA data on the cloud
Jupyter Notebook
6
star
41

pangeo-datastore-flask

Dynamic implementation of pangeo-datastore using Flask
CSS
5
star
42

foss4g-2021

Pangeo tutorial at FOSS4G 2021
Jupyter Notebook
5
star
43

openoceancloud

Website for openocean.cloud
HTML
5
star
44

atmos.pangeo.io-deploy

Deployment automation for atmos.pangeo.io
Jupyter Notebook
5
star
45

testcase_on_cnn

Experiment on CNN to climate data
Jupyter Notebook
5
star
46

pangeo-eosc

Pangeo for the European Open Science cloud
Jupyter Notebook
5
star
47

astro.pangeo.io-deploy

Deployment automation for astro.pangeo.io
Jupyter Notebook
4
star
48

clivar-2022

Arctic Processes in CMIP6 Bootcamp 2022
Jupyter Notebook
4
star
49

cookiecutter-pangeo-binder

Pangeo-Binder Cookiecutter Template
Jupyter Notebook
4
star
50

esgf2xarray

utilities for loading esgf archives as xarray datasets
Python
4
star
51

governance

Governance Documents for Pangeo
3
star
52

pangeo-datastore-stac

STAC implementation of Pangeo Catalog
Jupyter Notebook
3
star
53

multicloud-demo

Notebooks and infrastructure for Earthcube2020: Multi-Cloud workflows with Pangeo and Dask Gateway
Jupyter Notebook
3
star
54

geo-open-hack-2024

Event for geo-coders to explore open tools and approaches for enhancing geospatial analysis
3
star
55

bids2023_codesprint

Repository for the joint OSGEO and Pangeo code sprint at ESA BIDS in November 2023
Jupyter Notebook
3
star
56

example.pangeo.io-deploy

Deployment automation for example.pangeo.io
Jupyter Notebook
2
star
57

pangeo-astro-examples

Binder for astronomy stuff on pangeo
Jupyter Notebook
2
star
58

notebook-binder

image configurations for pangeo-binder
2
star
59

storage-intern-projects

Command line utility for migrating netcdf datasets to cloud storage
2
star
60

pangeo-ecco-llc

Demos of the ECCO LLC Reader
Jupyter Notebook
2
star
61

foss4g-2022

Pangeo tutorial at FOSS4G 2022
Jupyter Notebook
2
star
62

open-source-geoscience

A Binder-ready repo highlighting popular open-source goescience software tools
Jupyter Notebook
1
star
63

jupyterhub-monitoring

Grafana data and analysis products for the monitoring data on the research hubs
Jupyter Notebook
1
star
64

pangeo-geospatial-examples

Pangeo Geospatial Imagery Examples
Jupyter Notebook
1
star
65

pangeo-integration-tests

Integration testing for the Pangeo cloud ecosystem
Python
1
star
66

climpred-data

Data repository for climpred examples
Python
1
star
67

pangeo-for-hpc

Instructions and boilerplate for running Pangeo on HPC platforms
1
star
68

pangeo-igarss2024

Earthly marvels revealed: Pangeo, AI, and Copernicus in action
1
star
69

pangeo-binder-template

template repository for pangeo binder configuration
Jupyter Notebook
1
star