• Stars
    star
    184
  • Rank 209,187 (Top 5 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 8 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A collection of non-residential buildings for performance analysis and algorithm benchmarking

Check out the Building Data Genome 2 - the latest version that supercedes this one: https://github.com/buds-lab/building-data-genome-project-2

building data genome logo

  • Does your data science technique actually scale across hundreds of buildings?
  • Is it actually faster or more accurate?

These are questions that researchers should ask when developing data-driven methods. Building performance prediction, classi cation, and clustering algorithms are becoming an essential part of analysis for anomaly detection, control optimization, and demand response. But how do we actually compare, each individual technique against previously created methods?

The time-series data mining community identifed this problem as early as 2003: “Much of this work has very little utility because the contribution made”...“offer an amount of improvement that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details.” (Keogh, E. and Kasetty, S.: On the need for time series data mining benchmarks: A survey and empirical demonstration. Data Mining and Knowledge Discovery, 7(4):349–371, Oct. 2003.)

They created the time-series data benchmarking set. This data set enables testing of new techniques on an assortment of real world data sets. For commerical buildings data, we are doing the same!

The need for Benchmarking Data Set for Non-residential Building Data Analytics

Most of the existing building performance data science studies rely on each individual researcher creating their own methods, finding a case study data set and determining efficacy on their own. Not surprisingly, most of those researcher find positive, yet questionably meaningful results.

old way

Using a large, consistent benchmark data set from hundreds (or thousands) of buildings, a researcher can determine how well their methods actually perform across a heterogeneous data set. If multiple researcher use the same data set, then there can be meaningful comparisons of accuracy, speed and ease-of-use.

new way

Introducing the Building Data Genome Project

It is an open data set from 507 non-residential buildings that includes hourly whole building electrical meter data for one year. Each of the buildings has meta data such as or area, weather, and primary use type. This data set can be used to benchmark various statistical learning algorithms and other data science techniques. It can also be used simply as a teaching or learning tool to practice dealing with measured performance data from large numbers of non-residential buildings. The charts below illustrate the breakdown of the buildings according to location, building industry, sub-industry, and primary use type.

meta data

Please contribute new data sets or provide analysis examples in Jupyter or R markdown using the data

Citation of Data-Set

Clayton Miller, Forrest Meggers, The Building Data Genome Project: An open, public data set from non-residential building electrical meters, Energy Procedia, Volume 122, September 2017, Pages 439-444, ISSN 1876-6102, https://doi.org/10.1016/j.egypro.2017.07.400.

ResearchGate

BibTex:
@article{Miller2017439,
title = "The Building Data Genome Project: An open, public data set from non-residential building electrical meters ",
journal = "Energy Procedia ",
volume = "122",
number = "",
pages = "439 - 444",
year = "2017",
note = "\{CISBAT\} 2017 International ConferenceFuture Buildings & Districts – Energy Efficiency from Nano to Urban Scale ",
issn = "1876-6102",
doi = "https://doi.org/10.1016/j.egypro.2017.07.400",
url = "http://www.sciencedirect.com/science/article/pii/S1876610217330047",
author = "Clayton Miller and Forrest Meggers",
keywords = "Open Data",
keywords = "Non-Residential Building Meter Data",
keywords = "Benchmark Data Set",
keywords = "Big Data",
keywords = "Machine Learning ",
abstract = "Abstract As of 2015, there are over 60 million smart meters installed in the United States; these meters are at the forefront of big data analytics in the building industry. However, only a few public data sources of hourly non-residential meter data exist for the purpose of testing algorithms. This paper describes the collection, cleaning, and compilation of several such data sets found publicly on-line, in addition to several collected by the authors. There are 507 whole building electrical meters in this collection, and a majority are from buildings on university campuses. This group serves as a primary repository of open, non-residential data sources that can be built upon by other researchers. An overview of the data sources, subset selection criteria, and details of access to the repository are included. Future uses include the application of new, proposed prediction and classification models to compare performance to previously generated techniques. "
}

Getting Started

We recommend you download the Anaconda Python Distribution and use Jupyter to get an understanding of the data.

  • Raw temporal and meta data are found in /data/raw/

Example notebooks are found in /notebooks/ -- a few good overview examples:

Publications or Projects that use this data-set:

Please update this list if you add notebooks or R-Markdown files to the notebook folder.

Contact -- (Add yours if you contribute to the data set)

Dr. Clayton Miller Building and Urban Data Science (BUDS) Group National University of Singapore [email protected] http://budslab.org/

Dr. Forrest Meggers Cooling and Heating for Architecturally Optimized System (CHAOS) Lab Princeton University [email protected] http://chaos.princeton.edu/

Anjukan Kathirgamanathan PhD Student, Energy Institute University College Dublin [email protected] https://energyinstitute.ucd.ie/

Project Organization

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│    │    │
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
                          generated with `pip freeze > requirements.txt`

Project Organization

The MIT License (MIT) Copyright (c) 2016, Clayton Miller

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

More Repositories

1

building-data-genome-project-2

Whole building non-residential hourly energy meter data from the Great Energy Predictor III competition
Jupyter Notebook
183
star
2

ashrae-great-energy-predictor-3-solution-analysis

Analysis of top give winning solutions of the ASHRAE Great Energy Predictor III competition
Jupyter Notebook
73
star
3

python-for-building-analysts

Jupyter notebook tutorials to teach scripting to building performance analysis experts
Jupyter Notebook
61
star
4

building-prediction-benchmarking

An array of open source ML models applied to long-term hourly energy prediction for institutional buildings
Jupyter Notebook
26
star
5

simple-building

Simplified Building Simulation Engine
Python
25
star
6

data-science-for-construction-edx-course-notebooks

Jupyter/Colab Notebooks for Data Science for Construction, Architecture and Engineering
Jupyter Notebook
24
star
7

google-trends-for-buildings

Data and Code for the Paper "Using Google Trends to Predict Building Energy"
Jupyter Notebook
18
star
8

temporal-features-for-nonres-buildings-library

Jupyter notebooks for the Energy and Buildings Publication
Jupyter Notebook
17
star
9

buds-lab.github.io

BUDS Lab Website
SCSS
15
star
10

build2vec-thermal-comfort

code for Build2Vec 1.0 reproducibility
Jupyter Notebook
12
star
11

united-world-college-open-data

An IPython notebook analysis of the UWC Tampines commercial building dataset
Jupyter Notebook
12
star
12

LEAD-1st-solution

1st winning solution in Large-scale Energy Anomaly Detection (LEAD) competition
Jupyter Notebook
11
star
13

humans-as-a-sensor-for-buildings

Implementation of the Humans-as-a-Sensor for Buildings paper.
Jupyter Notebook
10
star
14

forensic-analysis-of-building-energy-data

Example Dataset from SimAUD 2015 Paper
HTML
9
star
15

energy-diffusion

Jupyter Notebook
9
star
16

building-data-directory

Python
8
star
17

energystar-plus-plus

Using Gradient Boosting Trees and Explainable ML for Commericial Building Benchmarking
HTML
8
star
18

psychrometric-chart-makeover

Adding more dimension to the psychrometric chart
Python
8
star
19

ComfortLearn

This repository is the official implementation of ComfortLearn: Enabling agent-based occupant-centric building controls
Jupyter Notebook
8
star
20

ashrae-great-energy-predictor-3-overview-analysis

Paper in Science and Technology for the Built Environment about the GEPIII Competition
Jupyter Notebook
7
star
21

aldiplusplus

This repository is the official implementation of ALDI++: Automatic and parameter-less discorddetection for daily load energy profiles
Jupyter Notebook
6
star
22

enerNOC-100-building-open-dataset-analysis

An IPython notebook overview of EnerNOC's open dataset
Jupyter Notebook
6
star
23

longitudinal-personal-thermal-comfort

Official repository for Dataset: Longitudinal personal thermal comfort preference data in the wild
Jupyter Notebook
6
star
24

comfortGAN

This repository is the official implementation of Balancing thermal comfort datasets: We GAN, but should we?
Jupyter Notebook
6
star
25

ccm

This repository is the official implementation of Cohort comfort models - Using occupants’ similarity to predict personal thermal preference with less data
Jupyter Notebook
5
star
26

day-filter

Automated daily pattern filtering of measured building performance data
Jupyter Notebook
5
star
27

elastic-buildings

Jupyter Notebook
4
star
28

jupyter-data-science-meetup

NUS Data Science Meetup - Data Science Workflow Tutorial in Jupyter
Jupyter Notebook
4
star
29

data-driven-greenmark

Dataset on Singapore's Green Mark Buildings
Jupyter Notebook
4
star
30

buildsys22-energy-forecasting-tutorial

Jupyter Notebook
4
star
31

ashrae-great-energy-predictor-3-error-analysis

Analysis of the Time Series Residuals of the Great Energy Predictor III competition
Jupyter Notebook
3
star
32

review-unsupervised-visualanalytics-for-buildings

A review of unsupervised statistical learning and visual analytics techniques applied to performance analysis of non-residential buildings
Jupyter Notebook
3
star
33

generative-methods-for-human-comfort

Human comfort datasets are widely used for multiple scenarios in smart buildings. From thermal comfort prediction to personalized indoor environments, labelled subjective responses from participants in a experiment are required to feed different machine learning models. However, many of these dataset are small in samples per participants, number of participants, or suffer from a class-imbalanced of its subjective responses. In this work we explore the use of Generative Adversarial Networks to generate synthetic samples to be used in combination with real ones for data-driven applications in the built environment.
Jupyter Notebook
3
star
34

ema-for-occupant-wellness-and-privacy

Cozie deployment for Indoor Air 2022 Paper on Occupant Wellness and Privacy
HTML
2
star
35

project-iris-dataset

Jupyter Notebook
2
star
36

twenty-years-of-bldgsim-textmining

Text mining the email repository of the BLDG-SIM list serv
Jupyter Notebook
2
star
37

island-of-misfit-buildings

Detecting mixed-use or primary-space-use outliers using load shape clustering
Jupyter Notebook
1
star
38

style-guide

The budslab style guide
1
star
39

spacematch-paper

spacematch paper repo
TeX
1
star
40

learning-trail-scroller-demo

Scrollytelling Demo for Learning Trail Stations
JavaScript
1
star
41

iob

Internet of Buildings Center
CSS
1
star
42

nus-pf1103-digital-construction

Data for NUS PF1103 Digital Construction Module
1
star
43

recommender-sys-for-buildings-textmining-review

Jupyter Notebook
1
star
44

abm-demo

OpenAI workshop for occupant-centric applications
Jupyter Notebook
1
star
45

cozie-examples

Repository of example scripts to interface with the cozie app
Python
1
star
46

Filling-time-series-gaps-using-image-techniques

Jupyter Notebook
1
star