• Stars
    star
    183
  • Rank 210,154 (Top 5 %)
  • Language
    Jupyter Notebook
  • License
    Other
  • Created over 4 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Whole building non-residential hourly energy meter data from the Great Energy Predictor III competition

logo

DOI

The Building Data Genome 2 (BDG2) Data-Set

Data-set description

BDG2 is an open data set made up of 3,053 energy meters from 1,636 buildings. The time range of the times-series data is the two full years (2016 and 2017) and the frequency is hourly measurements of electricity, heating and cooling water, steam, and irrigation meters. A subset of the data was used in the Great Energy Predictor III (GEPIII) competition hosted by the ASHRAE organization in late 2019. A full overview of the GEPIII competition can be found in a Science and Technology for the Built Environment Journal - Preprint found on arXiv

The GEPIII sub-set includes hourly data from 2,380 meters from 1,449 buildings that were used in a machine learning competition for long-term prediction with an application to measurement and verification in the building energy analysis domain. This data set can be used to benchmark various statistical learning algorithms and other data science techniques. It can also be used simply as a teaching or learning tool to practice dealing with measured performance data from large numbers of non-residential buildings. The charts below illustrate the breakdown of the buildings according to primary use category and subcategory, industry and subindustry, timezone and meter type.

cat_features

Getting Started

We recommend you download the Anaconda Python Distribution and use Jupyter to get an understanding of the data.

  • Temporal meters data are found in /data/meters/
  • Metadata is found in data/metadata/
  • To join all meters raw data into one dataset follow this notebook

Example notebooks are found in /notebooks/ -- a few good overview examples:

Detailed Documentation

The detailed documentation of how this data set was created can be found in the repository's wiki and in the following publication:

Citation of BDG2 Data-Set

Miller, C., Kathirgamanathan, A., Picchetti, B. et al. The Building Data Genome Project 2, energy meter data from the ASHRAE Great Energy Predictor III competition. Sci Data 7, 368 (2020). https://doi.org/10.1038/s41597-020-00712-x



@ARTICLE{Miller2020-yc,
  title     = "The Building Data Genome Project 2, energy meter data from the
               {ASHRAE} Great Energy Predictor {III} competition",
  author    = "Miller, Clayton and Kathirgamanathan, Anjukan and Picchetti,
               Bianca and Arjunan, Pandarasamy and Park, June Young and Nagy,
               Zoltan and Raftery, Paul and Hobson, Brodie W and Shi, Zixiao
               and Meggers, Forrest",
  abstract  = "This paper describes an open data set of 3,053 energy meters
               from 1,636 non-residential buildings with a range of two full
               years (2016 and 2017) at an hourly frequency (17,544
               measurements per meter resulting in approximately 53.6 million
               measurements). These meters were collected from 19 sites across
               North America and Europe, with one or more meters per building
               measuring whole building electrical, heating and cooling water,
               steam, and solar energy as well as water and irrigation meters.
               Part of these data was used in the Great Energy Predictor III
               (GEPIII) competition hosted by the American Society of Heating,
               Refrigeration, and Air-Conditioning Engineers (ASHRAE) in
               October-December 2019. GEPIII was a machine learning competition
               for long-term prediction with an application to measurement and
               verification. This paper describes the process of data
               collection, cleaning, and convergence of time-series meter data,
               the meta-data about the buildings, and complementary weather
               data. This data set can be used for further prediction
               benchmarking and prototyping as well as anomaly detection,
               energy analysis, and building type classification.
               Machine-accessible metadata file describing the reported data:
               https://doi.org/10.6084/m9.figshare.13033847",
  journal   = "Scientific Data",
  publisher = "Nature Publishing Group",
  volume    =  7,
  pages     = "368",
  month     =  oct,
  year      =  2020,
  language  = "en"
}


Preprints

Publications or Projects that use BDG2 data-set

Please update this list if you add notebooks or R-Markdown files to the notebook folder. Naming convention is a number (for ordering), the creator's initials, and a short - delimited description, e.g. 1.0-jqp-initial-data-exploration.

  • (publication here)

Repository structure

building-data-genome-project-2
โ”œโ”€ README.md              <- BDG2 README for developers using this data-set
โ””โ”€ data
|   โ”œโ”€metadata            <- buildings metadata
|   โ”œโ”€ weather            <- weather data
|   โ””โ”€ meters
|       โ””โ”€ raw            <- all meter reading datasets
|       โ””โ”€ cleaned        <- cleaned meter data based on several filtering steps
|       โ””โ”€ kaggle         <- the 2017 meter data that aligns with the Kaggle competition
โ”œโ”€ notebooks              <- Jupyter notebooks, named after the naming convention
โ””โ”€ figures                <- figures created during exploration of BDG 2.0 Data-set

More Repositories

1

the-building-data-genome-project

A collection of non-residential buildings for performance analysis and algorithm benchmarking
Jupyter Notebook
184
star
2

ashrae-great-energy-predictor-3-solution-analysis

Analysis of top give winning solutions of the ASHRAE Great Energy Predictor III competition
Jupyter Notebook
73
star
3

python-for-building-analysts

Jupyter notebook tutorials to teach scripting to building performance analysis experts
Jupyter Notebook
61
star
4

building-prediction-benchmarking

An array of open source ML models applied to long-term hourly energy prediction for institutional buildings
Jupyter Notebook
26
star
5

simple-building

Simplified Building Simulation Engine
Python
25
star
6

data-science-for-construction-edx-course-notebooks

Jupyter/Colab Notebooks for Data Science for Construction, Architecture and Engineering
Jupyter Notebook
24
star
7

google-trends-for-buildings

Data and Code for the Paper "Using Google Trends to Predict Building Energy"
Jupyter Notebook
18
star
8

temporal-features-for-nonres-buildings-library

Jupyter notebooks for the Energy and Buildings Publication
Jupyter Notebook
17
star
9

buds-lab.github.io

BUDS Lab Website
SCSS
15
star
10

build2vec-thermal-comfort

code for Build2Vec 1.0 reproducibility
Jupyter Notebook
12
star
11

united-world-college-open-data

An IPython notebook analysis of the UWC Tampines commercial building dataset
Jupyter Notebook
12
star
12

LEAD-1st-solution

1st winning solution in Large-scale Energy Anomaly Detection (LEAD) competition
Jupyter Notebook
11
star
13

humans-as-a-sensor-for-buildings

Implementation of the Humans-as-a-Sensor for Buildings paper.
Jupyter Notebook
10
star
14

forensic-analysis-of-building-energy-data

Example Dataset from SimAUD 2015 Paper
HTML
9
star
15

energy-diffusion

Jupyter Notebook
9
star
16

building-data-directory

Python
8
star
17

energystar-plus-plus

Using Gradient Boosting Trees and Explainable ML for Commericial Building Benchmarking
HTML
8
star
18

psychrometric-chart-makeover

Adding more dimension to the psychrometric chart
Python
8
star
19

ComfortLearn

This repository is the official implementation of ComfortLearn: Enabling agent-based occupant-centric building controls
Jupyter Notebook
8
star
20

ashrae-great-energy-predictor-3-overview-analysis

Paper in Science and Technology for the Built Environment about the GEPIII Competition
Jupyter Notebook
7
star
21

aldiplusplus

This repository is the official implementation of ALDI++: Automatic and parameter-less discorddetection for daily load energy profiles
Jupyter Notebook
6
star
22

enerNOC-100-building-open-dataset-analysis

An IPython notebook overview of EnerNOC's open dataset
Jupyter Notebook
6
star
23

longitudinal-personal-thermal-comfort

Official repository for Dataset: Longitudinal personal thermal comfort preference data in the wild
Jupyter Notebook
6
star
24

comfortGAN

This repository is the official implementation of Balancing thermal comfort datasets: We GAN, but should we?
Jupyter Notebook
6
star
25

ccm

This repository is the official implementation of Cohort comfort models - Using occupantsโ€™ similarity to predict personal thermal preference with less data
Jupyter Notebook
5
star
26

day-filter

Automated daily pattern filtering of measured building performance data
Jupyter Notebook
5
star
27

elastic-buildings

Jupyter Notebook
4
star
28

jupyter-data-science-meetup

NUS Data Science Meetup - Data Science Workflow Tutorial in Jupyter
Jupyter Notebook
4
star
29

data-driven-greenmark

Dataset on Singapore's Green Mark Buildings
Jupyter Notebook
4
star
30

buildsys22-energy-forecasting-tutorial

Jupyter Notebook
4
star
31

ashrae-great-energy-predictor-3-error-analysis

Analysis of the Time Series Residuals of the Great Energy Predictor III competition
Jupyter Notebook
3
star
32

review-unsupervised-visualanalytics-for-buildings

A review of unsupervised statistical learning and visual analytics techniques applied to performance analysis of non-residential buildings
Jupyter Notebook
3
star
33

generative-methods-for-human-comfort

Human comfort datasets are widely used for multiple scenarios in smart buildings. From thermal comfort prediction to personalized indoor environments, labelled subjective responses from participants in a experiment are required to feed different machine learning models. However, many of these dataset are small in samples per participants, number of participants, or suffer from a class-imbalanced of its subjective responses. In this work we explore the use of Generative Adversarial Networks to generate synthetic samples to be used in combination with real ones for data-driven applications in the built environment.
Jupyter Notebook
3
star
34

ema-for-occupant-wellness-and-privacy

Cozie deployment for Indoor Air 2022 Paper on Occupant Wellness and Privacy
HTML
2
star
35

project-iris-dataset

Jupyter Notebook
2
star
36

twenty-years-of-bldgsim-textmining

Text mining the email repository of the BLDG-SIM list serv
Jupyter Notebook
2
star
37

island-of-misfit-buildings

Detecting mixed-use or primary-space-use outliers using load shape clustering
Jupyter Notebook
1
star
38

style-guide

The budslab style guide
1
star
39

spacematch-paper

spacematch paper repo
TeX
1
star
40

learning-trail-scroller-demo

Scrollytelling Demo for Learning Trail Stations
JavaScript
1
star
41

iob

Internet of Buildings Center
CSS
1
star
42

nus-pf1103-digital-construction

Data for NUS PF1103 Digital Construction Module
1
star
43

recommender-sys-for-buildings-textmining-review

Jupyter Notebook
1
star
44

abm-demo

OpenAI workshop for occupant-centric applications
Jupyter Notebook
1
star
45

cozie-examples

Repository of example scripts to interface with the cozie app
Python
1
star
46

Filling-time-series-gaps-using-image-techniques

Jupyter Notebook
1
star