• Stars
    star
    2,943
  • Rank 15,392 (Top 0.4 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created over 7 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

HeavyDB (formerly OmniSciDB)

HeavyDB (formerly OmniSciDB)

HeavyDB is an open source SQL-based, relational, columnar database engine that leverages the full performance and parallelism of modern hardware (both CPUs and GPUs) to enable querying of multi-billion row datasets in milliseconds, without the need for indexing, pre-aggregation, or downsampling. HeavyDB can be run on hybrid CPU/GPU systems (Nvidia GPUs are currently supported), as well as on CPU-only systems featuring X86, Power, and ARM (experimental support) architectures. To achieve maximum performance, HeavyDB features multi-tiered caching of data between storage, CPU memory, and GPU memory, and an innovative Just-In-Time (JIT) query compilation framework.

For usage info, see the product documentation, and for more details about the system's internal architecture, check out the developer documentation. Further technical discussion can be found on the HEAVY.AI Community Forum.

The repository includes a number of third party packages provided under separate licenses. Details about these packages and their respective licenses is at ThirdParty/licenses/index.md.

Downloads and Installation Instructions

HEAVY.AI provides pre-built binaries for Linux for stable releases of the project:

Distro Package type CPU/GPU Repository Docs
CentOS RPM CPU https://releases.heavy.ai/os/yum/stable/cpu https://docs.heavy.ai/installation-and-configuration/installation/installing-on-centos/centos-yum-gpu-ee
CentOS RPM GPU https://releases.heavy.ai/os/yum/stable/cuda https://docs.heavy.ai/installation-and-configuration/installation/installing-on-centos/centos-yum-gpu-ee
Ubuntu DEB CPU https://releases.heavy.ai/os/apt/dists/stable/cpu https://docs.heavy.ai/installation-and-configuration/installation/installing-on-ubuntu/centos-yum-gpu-ee
Ubuntu DEB GPU https://releases.heavy.ai/os/apt/dists/stable/cuda https://docs.heavy.ai/installation-and-configuration/installation/installing-on-ubuntu/centos-yum-gpu-ee
* tarball CPU https://releases.heavy.ai/os/tar/heavyai-os-latest-Linux-x86_64-cpu.tar.gz
* tarball GPU https://releases.heavy.ai/os/tar/heavyai-os-latest-Linux-x86_64.tar.gz

Developing HeavyDB: Table of Contents

Links

License

This project is licensed under the Apache License, Version 2.0.

The repository includes a number of third party packages provided under separate licenses. Details about these packages and their respective licenses is at ThirdParty/licenses/index.md.

Contributing

In order to clarify the intellectual property license granted with Contributions from any person or entity, HEAVY.AI must have a Contributor License Agreement ("CLA") on file that has been signed by each Contributor, indicating agreement to the Contributor License Agreement. After making a pull request, a bot will notify you if a signed CLA is required and provide instructions for how to sign it. Please read the agreement carefully before signing and keep a copy for your records.

Building

If this is your first time building HeavyDB, install the dependencies mentioned in the Dependencies section below.

HeavyDB uses CMake for its build system.

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=debug ..
make -j 4

The following cmake/ccmake options can enable/disable different features:

  • -DCMAKE_BUILD_TYPE=release - Build type and compiler options to use. Options are Debug, Release, RelWithDebInfo, MinSizeRel, and unset.
  • -DENABLE_ASAN=off - Enable address sanitizer. Default is off.
  • -DENABLE_AWS_S3=on - Enable AWS S3 support, if available. Default is on.
  • -DENABLE_CUDA=off - Disable CUDA. Default is on.
  • -DENABLE_CUDA_KERNEL_DEBUG=off - Enable debugging symbols for CUDA kernels. Will dramatically reduce kernel performance. Default is off.
  • -DENABLE_DECODERS_BOUNDS_CHECKING=off - Enable bounds checking for column decoding. Default is off.
  • -DENABLE_FOLLY=on - Use Folly. Default is on.
  • -DENABLE_IWYU=off - Enable include-what-you-use. Default is off.
  • -DENABLE_JIT_DEBUG=off - Enable debugging symbols for the JIT. Default is off.
  • -DENABLE_ONLY_ONE_ARCH=off - Compile GPU code only for the host machine's architecture, speeding up compilation. Default is off.
  • -DENABLE_PROFILER=off - Enable google perftools. Default is off.
  • -DENABLE_STANDALONE_CALCITE=off - Require standalone Calcite server. Default is off.
  • -DENABLE_TESTS=on - Build unit tests. Default is on.
  • -DENABLE_TSAN=off - Enable thread sanitizer. Default is off.
  • -DENABLE_CODE_COVERAGE=off - Enable code coverage symbols (clang only). Default is off.
  • -DPREFER_STATIC_LIBS=off - Static link dependencies, if available. Default is off. Only works on CentOS.

Testing

HeavyDB uses Google Test as its main testing framework. Tests reside under the Tests directory.

The sanity_tests target runs the most common tests. If using Makefiles to build, the tests may be run using:

make sanity_tests

AddressSanitizer

AddressSanitizer can be activated by setting the ENABLE_ASAN CMake flag in a fresh build directory. At this time CUDA must also be disabled. In an empty build directory run CMake and compile:

mkdir build && cd build
cmake -DENABLE_ASAN=on -DENABLE_CUDA=off ..
make -j 4

Finally run the tests:

export ASAN_OPTIONS=alloc_dealloc_mismatch=0:handle_segv=0
make sanity_tests

ThreadSanitizer

ThreadSanitizer can be activated by setting the ENABLE_TSAN CMake flag in a fresh build directory. At this time CUDA must also be disabled. In an empty build directory run CMake and compile:

mkdir build && cd build
cmake -DENABLE_TSAN=on -DENABLE_CUDA=off ..
make -j 4

We use a TSAN suppressions file to ignore warnings in third party libraries. Source the suppressions file by adding it to your TSAN_OPTIONS env:

export TSAN_OPTIONS="suppressions=/path/to/heavydb/config/tsan.suppressions"

Finally run the tests:

make sanity_tests

Generating Packages

HeavyDB uses CPack to generate packages for distribution. Packages generated on CentOS with static linking enabled can be used on most other recent Linux distributions.

To generate packages on CentOS (assuming starting from top level of the heavydb repository):

mkdir build-package && cd build-package
cmake -DPREFER_STATIC_LIBS=on -DCMAKE_BUILD_TYPE=release ..
make -j 4
cpack -G TGZ

The first command creates a fresh build directory, to ensure there is nothing left over from a previous build.

The second command configures the build to prefer linking to the dependencies' static libraries instead of the (default) shared libraries, and to build using CMake's release configuration (enables compiler optimizations). Linking to the static versions of the libraries libraries reduces the number of dependencies that must be installed on target systems.

The last command generates a .tar.gz package. The TGZ can be replaced with, for example, RPM or DEB to generate a .rpm or .deb, respectively.

Using

The startheavy wrapper script may be used to start HeavyDB in a testing environment. This script performs the following tasks:

  • initializes the data storage directory via initdb, if required
  • starts the main HeavyDB server, heavydb
  • offers to download and import a sample dataset, using the insert_sample_data script

Assuming you are in the build directory, and it is a subdirectory of the heavydb repository, startheavy may be run by:

../startheavy

Starting Manually

It is assumed that the following commands are run from inside the build directory.

Initialize the data storage directory. This command only needs to be run once.

mkdir data && ./bin/initdb data

Start the HeavyDB server:

./bin/heavydb

If desired, insert a sample dataset by running the insert_sample_data script in a new terminal:

../insert_sample_data

You can now start using the database. The heavysql utility may be used to interact with the database from the command line:

./bin/heavysql -p HyperInteractive

where HyperInteractive is the default password. The default user admin is assumed if not provided.

Code Style

Contributed code should compile without generating warnings by recent compilers on most Linux distributions. Changes to the code should follow the C++ Core Guidelines.

clang-format

A .clang-format style configuration, based on the Chromium style guide, is provided at the top level of the repository. Please format your code using a recent version (8.0+ preferred) of ClangFormat before submitting.

To use:

clang-format -i File.cpp

clang-tidy

A .clang-tidy configuration is provided at the top level of the repository. Please lint your code using a recent version (6.0+ preferred) of clang-tidy before submitting.

clang-tidy requires all generated files to exist before running. The easiest way to accomplish this is to simply run a full build before running clang-tidy. A build target which runs clang-tidy is provided. To use:

make run-clang-tidy

Note: clang-tidy may make invalid or overly verbose changes to the source code. It is recommended to first commit your changes, then run clang-tidy and review its recommended changes before amending them to your commit.

Note: the clang-tidy target uses the run-clang-tidy.py script provided with LLVM, which may depend on PyYAML. The target also depends on jq, which is used to filter portions of the compile_commands.json file.

Dependencies

HeavyDB has the following dependencies:

Package Min Version Required
CMake 3.16 yes
LLVM 9.0 yes
GCC 8.4.0 no, if building with clang
Go 1.12 yes
Boost 1.72.0 yes
OpenJDK 1.7 yes
CUDA 11.0 yes, if compiling with GPU support
gperftools yes
gdal 2.4.2 yes
Arrow 3.0.0 yes

CentOS 7

HeavyDB requires a number of dependencies which are not provided in the common CentOS/RHEL package repositories. A prebuilt package containing all these dependencies is provided for CentOS 7 (x86_64).

Use the scripts/mapd-deps-prebuilt.sh build script to install prebuilt dependencies.

These dependencies will be installed to a directory under /usr/local/mapd-deps. The mapd-deps-prebuilt.sh script also installs Environment Modules in order to simplify managing the required environment variables. Log out and log back in after running the mapd-deps-prebuilt.sh script in order to active Environment Modules command, module.

The mapd-deps environment module is disabled by default. To activate for your current session, run:

module load mapd-deps

To disable the mapd-deps module:

module unload mapd-deps

WARNING: The mapd-deps package contains newer versions of packages such as GCC and ncurses which might not be compatible with the rest of your environment. Make sure to disable the mapd-deps module before compiling other packages.

Instructions for installing CUDA are below.

CUDA

It is preferred, but not necessary, to install CUDA and the NVIDIA drivers using the .rpm using the instructions provided by NVIDIA. The rpm (network) method (preferred) will ensure you always have the latest stable drivers, while the rpm (local) method allows you to install does not require Internet access.

The .rpm method requires DKMS to be installed, which is available from the Extra Packages for Enterprise Linux repository:

sudo yum install epel-release

Be sure to reboot after installing in order to activate the NVIDIA drivers.

Environment Variables

The mapd-deps-prebuilt.sh script includes two files with the appropriate environment variables: mapd-deps-<date>.sh (for sourcing from your shell config) and mapd-deps-<date>.modulefile (for use with Environment Modules, yum package environment-modules). These files are placed in mapd-deps install directory, usually /usr/local/mapd-deps/<date>. Either of these may be used to configure your environment: the .sh may be sourced in your shell config; the .modulefile needs to be moved to the modulespath.

Building Dependencies

The scripts/mapd-deps-centos.sh script is used to build the dependencies. Modify this script and run if you would like to change dependency versions or to build on alternative CPU architectures.

cd scripts
module unload mapd-deps
./mapd-deps-centos.sh --compress

macOS

scripts/mapd-deps-osx.sh is provided that will automatically install and/or update Homebrew and use that to install all dependencies. Please make sure macOS is completely up to date and Xcode is installed before running. Xcode can be installed from the App Store.

CUDA

mapd-deps-osx.sh will automatically install CUDA via Homebrew and add the correct environment variables to ~/.bash_profile.

Java

mapd-deps-osx.sh will automatically install Java and Maven via Homebrew and add the correct environment variables to ~/.bash_profile.

Ubuntu

Most build dependencies required by HeavyDB are available via APT. Certain dependencies such as Thrift, Blosc, and Folly must be built as they either do not exist in the default repositories or have outdated versions. A prebuilt package containing all these dependencies is provided for Ubuntu 18.04 (x86_64). The dependencies will be installed to /usr/local/mapd-deps/ by default; see the Environment Variables section below for how to add these dependencies to your environment.

Ubuntu 16.04

HeavyDB requires a newer version of Boost than the version which is provided by Ubuntu 16.04. The scripts/mapd-deps-ubuntu1604.sh build script will compile and install a newer version of Boost into the /usr/local/mapd-deps/ directory.

Ubuntu 18.04

Use the scripts/mapd-deps-prebuilt.sh build script to install prebuilt dependencies.

These dependencies will be installed to a directory under /usr/local/mapd-deps. The mapd-deps-prebuilt.sh script above will generate a script named mapd-deps.sh containing the environment variables which need to be set. Simply source this file in your current session (or symlink it to /etc/profile.d/mapd-deps.sh) in order to activate it:

source /usr/local/mapd-deps/mapd-deps.sh

Environment Variables

The CUDA and mapd-deps lib directories need to be added to LD_LIBRARY_PATH; the CUDA and mapd-deps bin directories need to be added to PATH. The mapd-deps-ubuntu.sh and mapd-deps-prebuilt.sh scripts will generate a script named mapd-deps.sh containing the environment variables which need to be set. Simply source this file in your current session (or symlink it to /etc/profile.d/mapd-deps.sh) in order to activate it:

source /usr/local/mapd-deps/mapd-deps.sh

CUDA

Recent versions of Ubuntu provide the NVIDIA CUDA Toolkit and drivers in the standard repositories. To install:

sudo apt install -y \
    nvidia-cuda-toolkit

Be sure to reboot after installing in order to activate the NVIDIA drivers.

Building Dependencies

The scripts/mapd-deps-ubuntu.sh and scripts/mapd-deps-ubuntu1604.sh scripts are used to build the dependencies for Ubuntu 18.04 and 16.04, respectively. The scripts will install all required dependencies (except CUDA) and build the dependencies which require it. Modify this script and run if you would like to change dependency versions or to build on alternative CPU architectures.

cd scripts
./mapd-deps-ubuntu.sh --compress

Arch

scripts/mapd-deps-arch.sh is provided that will use yay to install packages from the Arch User Repository and a custom PKGBUILD script for Apache Arrow. If you don't have yay yet, install it first: https://github.com/Jguer/yay#installation

Note: Apache Arrow, while available in the AUR, requires a few custom build flags in order to be used with Core. A custom PKGBUILD for it is included.

CUDA

CUDA and the NVIDIA drivers may be installed using the following.

yay -S \
    linux-headers \
    cuda \
    nvidia

Be sure to reboot after installing in order to activate the NVIDIA drivers.

Environment Variables

The cuda package should set up the environment variables required to use CUDA. If you receive errors saying nvcc is not found, then CUDA bin directories need to be added to PATH: the easiest way to do so is by creating a new file named /etc/profile.d/mapd-deps.sh containing the following:

PATH=/opt/cuda/bin:$PATH
export PATH

More Repositories

1

heavyai-charting

Dimensional charting built to work natively with crossfilter rendered using d3.js
JavaScript
398
star
2

pymapd

Python client for OmniSci GPU-accelerated SQL engine and analytics platform
Python
111
star
3

heavyai-connector

A JavaScript library for connecting to a OmniSci GPU database and running queries.
JavaScript
88
star
4

heavyai-tweetmap-2

Play with and instantly visualize hundreds of millions of realtime tweets, from the global level all the way down to your neighborhood.
JavaScript
51
star
5

heavyai-crossfilter

JavaScript library for exploring large multivariate datasets in the browser.
JavaScript
51
star
6

mapd3

MapD3 charts library optimized for fast interactivity
JavaScript
41
star
7

metis

Tools for massively parallel and multi-variate data exploration
JavaScript
39
star
8

jupyterlab-heavyai

HeavyAI integrations for JupyterLab
Jupyter Notebook
32
star
9

vega-transform-heavydb

A Vega transform for HeavyDB
JavaScript
29
star
10

rbc

Remote Backend Compiler
Python
29
star
11

F1-demo

Real-time vehicle telematics analytics demo using OmniSci
C
29
star
12

mapd-draw

2d shape drawing and interaction library using an HTML canvas 2d rendering context. Basis for lasso tool in MapD Immerse.
JavaScript
26
star
13

heavyai

Python Data Science package for HeavyDB.
Python
21
star
14

mapd-ml-demo

Extract, PreProcess, and Analyze big data on GPUs
Jupyter Notebook
21
star
15

heavyai.jl

Julia client for OmniSci GPU-accelerated SQL engine and analytics platform
Julia
21
star
16

omni-component

A component system for organizing Omniverse UI code inspired by React
Python
19
star
17

lidar-visualization

JavaScript
15
star
18

community_datasets

Example datasets and dashboards known to work well in OmniSci
Shell
15
star
19

ROmniSci

R client for OmniSci GPU-accelerated relational database and visualization platform
C++
9
star
20

log-scraper

Tool to extract data from the logs of OmniSciDB
Rust
9
star
21

omnisci-vega-mapboxgl-demo-philly-parking

Custom OmniSci app using Vega and MapboxGLJS
JavaScript
9
star
22

omnisci-ui

OmniSci UI components
SCSS
8
star
23

heavyai-jdbc

A JDBC driver for connecting to an HeavyAI GPU database and running queries.
Java
8
star
24

mapd-ml-workshop

Extract, PreProcess, and Analyze big data on CPUs
Dockerfile
7
star
25

pydatanyc2018

Source materials from Randy Zwitch 'End to End Data Science Without Leaving the GPU' talk at PyData NYC 2018
Jupyter Notebook
7
star
26

mapd-vega-mapboxgl-demo

example of using vega spec for mapd backend rendering with mapboxgl.js
JavaScript
6
star
27

mapd-prototypes

Internal prototypes. Use at your own risk!
JavaScript
6
star
28

pymapd-examples

examples of using pymapd get information and put it into omnisci
Python
6
star
29

ibis-heavyai

ibis backend for HeavyDB
Python
6
star
30

sqlalchemy-heavyai

OmniSci Driver for SQLAlchemy
Python
5
star
31

community-geo-demos

Jupyter Notebook
5
star
32

EARL2018

Code for EARL 2018 Boston demonstration by Randy Zwitch
HTML
5
star
33

google-analytics

Exploring Google Analytics Data with MapD
Python
5
star
34

bitcoin-project

Python
4
star
35

mapd-frontend-boilerplate

Frontend boilerplate for new projects / demos
TypeScript
4
star
36

heavyai-olio.py

A medley of python functions to use with HeavyDB
Python
3
star
37

heavyai-rs

Rust client for OmniSci https://github.com/omnisci/omniscidb https://www.omnisci.com/
Rust
3
star
38

mapd-vega-openlayers

Mapd vega layer and openlayers library integration
JavaScript
3
star
39

mapd_on_azure

Installation scripts for MapD on Microsoft Azure
Shell
3
star
40

mapd-vega-openlayers-react

Mapd vega layer and openlayers mapping library integration with react/redux
JavaScript
3
star
41

mapd-semiotic

Demo / Investigation of using the Semiotic charting library with MapD
JavaScript
3
star
42

AICamp_demo_201807

Demo files for AICamp tutorial on GOAI/GPU DataFrame
Jupyter Notebook
3
star
43

odlt

OmniSci Data Library Transfer - Automated import and export of sets of OmniSci dashboards, tables, and views
Python
3
star
44

heavyai-notebooks

Example Python notebooks for use with the HEAVY.AI analytics platform
Jupyter Notebook
2
star
45

openvisconf-workshop-2018

Materials for the MapD OpenVisConf 2018 Workshop
JavaScript
2
star
46

gbfs_kafka

A Python and Kafka dataflow for the General Bikeshare Feed Specification (GBFS)
Python
2
star
47

vegaChart

JavaScript
2
star
48

Charting-Sample

This is a basic example of using the MapD charting libraries
JavaScript
2
star
49

conda-recipes

Shell
2
star
50

pymapd-workshop

Sample code used for explaining Pymapd API usage.
Jupyter Notebook
2
star
51

omnisci-converge19-workshop1

Examples and exercise for OmniSci Converge Workshop 1
JavaScript
2
star
52

geobenchmarks

Short snippets of python testing various geo-enrichment examples and timing the results
Jupyter Notebook
2
star
53

spire_global

Tools for ETL, Visualization and Analysis of Spire Global GRIB2 Weather Data in OmniSci
1
star
54

odscwebinar

Code and Slides for 'End to End GPU Analytics' webinar for ODSC
Jupyter Notebook
1
star
55

timeseries

Jupyter Notebook
1
star
56

covid19

All scripts and code related to our covid19 work
Python
1
star
57

jenkins-test

1
star
58

mapd-qlik-extension

MapD Vega Rendering Extension for Qlik
JavaScript
1
star
59

community-demos-geo

demos of geospatial functionality created by and for the MapD community
Jupyter Notebook
1
star
60

embedded-immerse-tutorial

Docs + examples of how to communicate with Immerse via the external API.
JavaScript
1
star
61

mapd-sunburst-chart

Stand along example chart integrated with mapd database
JavaScript
1
star