• Stars
    star
    317
  • Rank 132,216 (Top 3 %)
  • Language
    HTML
  • Created over 6 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A comprehensive, global, open source database of power plants

Global Power Plant Database

This project is not currently maintained by WRI. There are no planned updates as of this time (early 2022). The last version of this database is version 1.3.0. If we learn of active forks or maintained versions of the code and database we will attempt to provide links in the future.

This project aims to build an open database of all the power plants in the world. It is the result of a large collaboration involving many partners, coordinated by the World Resources Institute and Google Earth Outreach. If you would like to get involved, please email the team or fork the repo and code! To learn more about how to contribute to this repository, read the CONTRIBUTING document.

The latest database release (v1.3.0) is available in CSV format here under a Creative Commons-Attribution 4.0 (CC BY 4.0) license. A bleeding-edge version is in the output_database directory of this repo.

All Python source code is available under a MIT license.

This work is made possible and supported by Google, among other organizations.

Database description

The Global Power Plant Database is built in several steps.

  • The first step involves gathering and processing country-level data. In some cases, these data are read automatically from offical government websites; the code to implement this is in the build_databases directory.
  • In other cases we gather country-level data manually. These data are saved in raw_source_files/WRI and processed with the build_database_WRI.py script in the build_database directory.
  • The second step is to integrate data from different sources, particularly for geolocation of power plants and annual total electricity generation. Some of these different sources are multi-national databases. For this step, we rely on offline work to match records; the concordance table mapping record IDs across databases is saved in resources/master_plant_concordance.csv.

Throughout the processing, we represent power plants as instances of the PowerPlant class, defined in powerplant_database.py. The final database is in a flat-file CSV format.

Key attributes of the database

The database includes the following indicators:

  • Plant name
  • Fuel type(s)
  • Generation capacity
  • Country
  • Ownership
  • Latitude/longitude of plant
  • Data source & URL
  • Data source year
  • Annual generation

We will expand this list in the future as we extend the database.

Fuel Type Aggregation

We define the "Fuel Type" attribute of our database based on common fuel categories. In order to parse the different fuel types used in our various data sources, we map fuel name synonyms to our fuel categories here. We plan to expand the database in the future to report more disaggregated fuel types.

Combining Multiple Data Sources

A major challenge for this project is that data come from a variety of sources, including government ministries, utility companies, equipment manufacturers, crowd-sourced databases, financial reports, and more. The reliability of the data varies, and in many cases there are conflicting values for the same attribute of the same power plant from different data sources. To handle this, we match and de-duplicate records and then develop rules for which data sources to report for each indicator. We provide a clear data lineage for each datum in the database. We plan to ultimately allow users to choose alternative rules for which data sources to draw on.

To the maximum extent possible, we read data automatically from trusted sources, and integrate it into the database. Our current strategy involves these steps:

  • Automate data collection from machine-readable national data sources where possible.
  • For countries where machine-readable data are not available, gather and curate power plant data by hand, and then match these power plants to plants in other databases, including GEO and CARMA (see below) to determine their geolocation.
  • For a limited number of countries with small total power-generation capacity, use data directly from Global Energy Observatory (GEO).

A table describing the data source(s) for each country is listed below.

Finally, we are examining ways to automatically incorporate data from the following supra-national data sources:

ID numbers

We assign a unique ID to each line of data that we read from each source. In some cases, these represent plant-level data, while in other cases they represent unit-level data. In the case of unit-level data, we commonly perform an aggregation step and assign a new, unique plant-level ID to the result. For plants drawn from machine-readable national data sources, the reference ID is formed by a three-letter country code ISO 3166-1 alpha-3 and a seven-digit number. For plants drawn from other database (including the manually-maintained dataset by WRI), the reference ID is formed by a variable-size prefix code and a seven-digit number.

Power plant matching

In many cases our data sources do not include power plant geolocation information. To address this, we attempt to match these plants with the GEO and CARMA databases, in order to use that geolocation data. We use an elastic search matching technique developed by Enipedia to perform the matching based on plant name, country, capacity, location, with confirmed matches stored in a concordance file. This matching procedure is complex and the algorithm we employ can sometimes wrongly match two power plants or fail to match two entries for the same power plant. We are investigating using the Duke framework for matching, which allows us to do the matching offline.

Build Instructions

The build system is as follows

  • Create a virtual environment with Python 2.7 and the third-party packages in requirements.txt
  • cd into build_databases/
  • run each build_database_*.py file for each data source or processing method that changed (when making a database update)
  • run build_global_power_plant_database.py which reads from the pickled store/sub-databases.
  • cd into ../utils
  • run database_country_summary.py to produce summary table
  • cd into ../output_database
  • copy global_power_plant_database.csv to the gppd-ai4earth-api repository. Look a the Makefile in that repo to understand where it should be located
  • build new generation estimations as needed based on plant changes and updates compared to the stored and calculated values - this is not automatic, but there are some helper scripts for making the estimates
  • run the make_gppd.py script in gppd-ai4earth-api to construct a new version of the database with the full estimation data
  • copy the new merged dataset back to this repo, increment the DATABASE_VERSION file, commit, etc...

Related repos

More Repositories

1

gfw

Global Forest Watch: An online, global, near-real time forest monitoring tool
JavaScript
269
star
2

sentinel-tree-cover

Image segmentations of trees outside forest
Jupyter Notebook
160
star
3

carbon-budget

Calculate gross GHG emissions, gross carbon removals (sequestration), and net flux from forests globally
Python
76
star
4

gfw-mapbuilder

Template for the GFW Map Builder that is available through ArcGIS Online, as a stand-alone web application, & a library to build custom Forest Atlas web applications
TypeScript
33
star
5

UrbanLandUse

Characterizing urban land use with machine learning
Jupyter Notebook
29
star
6

demographic-identifier

Deep learning approach to identifying demographics and topics of tweets
Jupyter Notebook
25
star
7

aqueduct30_data_download

Make the Aqueduct 3.0 database available for download
Jupyter Notebook
19
star
8

rw-dynamicworld-cd

A repository holding code and example notebooks for change detection methodology for the Dynamic World Land Cover product.
Jupyter Notebook
19
star
9

mmdetection-satellite-dinov2

mmdetection-satellite-dinov2
Jupyter Notebook
15
star
10

gfw-fires-app

The Global Forest Watch Fires app
JavaScript
14
star
11

Aqueduct30Docker

add readme.md
Jupyter Notebook
13
star
12

raster2points

CLI to convert raster data to points (CSV or Pandas Dataframe)
Python
11
star
13

ReservoirWatchHack

11
star
14

UrbanHeatMitigation

Estimating surface reflectivity for multiple urban areas in the United States using machine learning
Jupyter Notebook
11
star
15

charm-global-level

CHARM, a model that estimates the GHG impacts and land-use requirements of forestry
Python
10
star
16

gfw-data-api

GFW Data API
Python
10
star
17

ev-simulator

Python code backing WRI technical note "Simulator to Quantify and Manage Electric Vehicle Load Impacts on Low-voltage Distribution Grids"
Python
10
star
18

gfw_forest_loss_geotrellis

Global Tree Cover Loss Analysis using Geotrellis and SPARK
Scala
10
star
19

cities-urbanshift

Jupyter Notebook
9
star
20

natural-lands-map

Documentation for creating the Natural Lands Map for SBTN Target Setting
JavaScript
9
star
21

Aqueduct40

Aqueduct 4.0 data and methodology
Jupyter Notebook
8
star
22

gfw-raster-analysis-lambda

GFW Raster Analysis in AWS Lambda
Python
8
star
23

retrieveR

A system for automating information retrieval from a corpus of documents
HTML
7
star
24

global-pasture-watch

Python
7
star
25

gee_toolbox

A convenience module and command line tool for GEE.
Python
6
star
26

aqueduct-components

Aqueduct components is a collection of React components shared with all Aqueduct tools.
JavaScript
6
star
27

wri-bounds

Country shapefiles and boundaries for WRI
Makefile
6
star
28

gfw-analysis

GFW Analysis
Python
5
star
29

gfw_pixetl

GFW ETL for raster tiles
Python
5
star
30

gfw-sync2

Tools to manage pushing source data to the GFW Platform
Python
5
star
31

gppd-ai4earth-api

A package for ML models that estimate plant-level annual generation
Python
5
star
32

cities-cities4forests-indicators

Jupyter Notebook
4
star
33

gfw-tile-cache

GFW Tile Cache Service
Python
4
star
34

ndc

Climate Watch Explore NDCs module. Full text of NDCs in HTML coding.
HTML
4
star
35

wri-scl-data-public

Systems Change Lab public data repository
3
star
36

fti_api

OPEN TIMBER PORTAL API
Ruby
3
star
37

gfw-datapump

Nightly batch process to generate summary statistics for user AOIs
Python
3
star
38

ForestAtlasToolbox

Python Toolbox to translate domains, dataset alias, field alias and subtypes and manage metadata
Python
3
star
39

glad-raster-analysis-lambda

POC of using AWS Lambda + Rasterio for raster calculations on S3
Python
3
star
40

cities-cif

Jupyter Notebook
3
star
41

gfw-places-to-watch

Generate new places to watch data when GLAD is updated
Python
3
star
42

gfw-forestlearn

A repository to hold code and examples for binary classification and regression relating to two AI4Earth projects: Mapping Global Carbon Mitigation and Predicting Future Deforestation in the Congo Basin
Jupyter Notebook
3
star
43

MAPSPAM

Process MAPSPAM data and make available in additional formats.
Jupyter Notebook
2
star
44

cities-dataportal-hazards

Jupyter Notebook
2
star
45

wri-list-tools

Python tools for analyzing large data lists.
Python
2
star
46

jurisdictional-risk

Python
2
star
47

glad_tiles_pipeline

Pipeline for GLAD tile processing
Python
2
star
48

wri-terramatch-website

README info goes here :)
TypeScript
2
star
49

zonal-stats-app

Python
2
star
50

gfw-water

JavaScript
2
star
51

otp-observations-tool

TypeScript
2
star
52

dl_jobs

CLI and Helper modules for managing Descartes Labs Tasks
Python
2
star
53

wri-terramatch-api

PHP
2
star
54

comms-dev-documentation

Documentation and notes for dev projects
JavaScript
2
star
55

tree_canopy_fcn

tree canopy predictions
Python
2
star
56

hansen_ee_processing

scripts for post processing hansen data on ee
Python
2
star
57

wri_sites

WRI profile
SCSS
2
star
58

gfw-iac-workshop

Materials needed for exercises associated with a Terraform/Infrastructure as Code workshop for the World Resources Institute (WRI).
HCL
2
star
59

earth-dashboard

JavaScript
2
star
60

gfw-commodities-app

Global Forest Watch Commodities app
JavaScript
2
star
61

template-python

Template repository for python-based projects
Makefile
2
star
62

wri-wpsi

The Water, Peace and Security partnership website
Ruby
2
star
63

cities-indicators-dashboard-urbanshift

R
1
star
64

wri-odp

The WRI Open Data Portal
TypeScript
1
star
65

harvest-fires

harvest fires in the last 24 hours and upload to cartodb
Clojure
1
star
66

gfw-mapbuilder-georspo

JavaScript
1
star
67

biomass_tiles

GFW Climate: Biomass tiles
Python
1
star
68

gfw_forest_loss_geotrellis_arcpy_client

Arcpy client for GFW Forest Loss Analysis
Python
1
star
69

cities-OpenUrban

R
1
star
70

GlobAgri-WRR-Expansion

Python
1
star
71

cities-heat-workspace

Jupyter Notebook
1
star
72

odp_front_end

HTML, CSS, JavaScript code to build a workable UI for the GFW Open Data Portal
CSS
1
star
73

spotutil

CLI tool to create a spot worker on AWS and launch processes
Python
1
star
74

gfw-components

React component library for the Global Forest Watch project
JavaScript
1
star
75

ira-eligibility-enhancements

HTML
1
star
76

agriadapt

Web app for physical climate risk tool.
JavaScript
1
star
77

policy-toolkit

Massively multitask fine tuning of roBERTa for policy priority identification in unstructured text
Jupyter Notebook
1
star
78

glad-data-management

Update GLAD sqlite database, CSV exports and country stats
Python
1
star
79

gfw-country-pages-analysis-2

Enrich GLAD and Fire data with contextual information in Hadoop, upload results to GFW API
Python
1
star
80

gfw-dockerfiles

GFW Dockerfiles
Dockerfile
1
star
81

template-public-documentation

Template for MkDocs-based public documentation, including the deployment pipeline to host on GitHub Pages using GitHub Actions.
Makefile
1
star
82

cities-climatehazards-warmingscenarios

Jupyter Notebook
1
star
83

landmark

JavaScript
1
star
84

cities-socio-economic-vulnerability

R
1
star
85

UrbanTreeCanopy

Jupyter Notebook
1
star
86

gfw-aws-core-infrastructure

Terraform module for GFW Core infrastructure on AWS
HCL
1
star
87

cities-indicators

Jupyter Notebook
1
star
88

basemaps

WRI Basemaps
CartoCSS
1
star
89

ocean-watch

The Ocean Watch open data platform delivers science to policy makers developing sustainable ocean economies and operationalizing integrated ocean management.
JavaScript
1
star
90

urban_extent

urban extents from built-up
Jupyter Notebook
1
star
91

Vetting_Retro_Analysis_Code

Code written in R to track the key factors in scoring that predicted applicant success
1
star
92

cities-thermal-comfort-modeling

Jupyter Notebook
1
star