• Stars
    star
    453
  • Rank 93,202 (Top 2 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created about 7 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

H2Oai GPU Edition

H2O4GPU

Join the chat at https://gitter.im/h2oai/h2o4gpu

H2O4GPU is a collection of GPU solvers by H2Oai with APIs in Python and R. The Python API builds upon the easy-to-use scikit-learn API and its well-tested CPU-based algorithms. It can be used as a drop-in replacement for scikit-learn (i.e. import h2o4gpu as sklearn) with support for GPUs on selected (and ever-growing) algorithms. H2O4GPU inherits all the existing scikit-learn algorithms and falls back to CPU algorithms when the GPU algorithm does not support an important existing scikit-learn class option. The R package is a wrapper around the H2O4GPU Python package, and the interface follows standard R conventions for modeling.

Daal library added for CPU, currently supported only x86_64 architecture.

Requirements

  • PC running Linux with glibc 2.17+

  • Install CUDA with bundled display drivers ( CUDA 8 or CUDA 9 or CUDA 9.2) or CUDA 10)

  • Python shared libraries (e.g. On Ubuntu: sudo apt-get install libpython3.6-dev)

When installing, choose to link the cuda install to /usr/local/cuda . Ensure to reboot after installing the new nvidia drivers.

  • Nvidia GPU with Compute Capability >= 3.5 (Capability Lookup).

  • For advanced features, like handling rows/32 > 2^16 (i.e., rows > 2,097,152) in K-means, need Capability >= 5.2

  • For building the R package, libcurl4-openssl-dev, libssl-dev, and libxml2-dev are needed.

User Installation

Note: Installation steps mentioned below are for users planning to use H2O4GPU. See DEVEL.md for developer installation.

H2O4GPU can be installed using either PIP or Conda

Prerequisites

Add to ~/.bashrc or environment (set appropriate paths for your OS):

export CUDA_HOME=/usr/local/cuda # or choose /usr/local/cuda9 for cuda9 and /usr/local/cuda8 for cuda8
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_HOME/lib64/:$CUDA_HOME/lib/:$CUDA_HOME/extras/CUPTI/lib64
  • Install OpenBlas dev environment:
sudo apt-get install libopenblas-dev pbzip2

If you are building the h2o4gpu R package, it is necessary to install the following dependencies:

sudo apt-get -y install libcurl4-openssl-dev libssl-dev libxml2-dev

PIP install

Download the Python wheel file (For Python 3.6):

Start a fresh pyenv or virtualenv session.

Install the Python wheel file. NOTE: If you don't use a fresh environment, this will overwrite your py3nvml and xgboost installations to use our validated versions.

pip install h2o4gpu-0.3.0-cp36-cp36m-linux_x86_64.whl

Conda installation

Ensure you meet the Requirements and have installed the Prerequisites.

If not already done you need to install conda package manager. Ensure you test your conda installation

H204GPU packages for CUDA8, CUDA 9 and CUDA 9.2 are available from h2oai channel in anaconda cloud.

Create a new conda environment with H2O4GPU based on CUDA 9.2 and all its dependencies using the following command. For other cuda versions substitute the package name as needed. Note the requirement for h2oai and conda-forge channels.

conda create -n h2o4gpuenv -c h2oai -c conda-forge -c rapidsai h2o4gpu-cuda10

Once the environment is created activate it source activate h2o4gpuenv.

To test, start an interactive python session in the environment and follow the steps in the Test Installation section below.

h2o4gpu R package

At this point, you should have installed the H2O4GPU Python package successfully. You can then go ahead and install the h2o4gpu R package via the following:

if (!require(devtools)) install.packages("devtools")
devtools::install_github("h2oai/h2o4gpu", subdir = "src/interface_r")

Detailed instructions can be found here.

Test Installation

To test your installation of the Python package, the following code:

import h2o4gpu
import numpy as np

X = np.array([[1.,1.], [1.,4.], [1.,0.]])
model = h2o4gpu.KMeans(n_clusters=2,random_state=1234).fit(X)
model.cluster_centers_

should give input/output of:

>>> import h2o4gpu
>>> import numpy as np
>>>
>>> X = np.array([[1.,1.], [1.,4.], [1.,0.]])
>>> model = h2o4gpu.KMeans(n_clusters=2,random_state=1234).fit(X)
>>> model.cluster_centers_
array([[ 1.,  1.  ],
       [ 1.,  4.  ]])

To test your installation of the R package, try the following example that builds a simple XGBoost random forest classifier:

library(h2o4gpu)

# Setup dataset
x <- iris[1:4]
y <- as.integer(iris$Species) - 1

# Initialize and train the classifier
model <- h2o4gpu.random_forest_classifier() %>% fit(x, y)

# Make predictions
predictions <- model %>% predict(x)

Next Steps

For more examples using Python API, please check out our Jupyter notebook demos. To run the demos using a local wheel run, at least download src/interface_py/requirements_runtime_demos.txt from the Github repo and do:

pip install -r src/interface_py/requirements_runtime_demos.txt

and then run the jupyter notebook demos.

For more examples using R API, please visit the vignettes.

Running Jupyter Notebooks

You can run Jupyter Notebooks with H2O4GPU in the below two ways

Creating a Conda Environment

Ensure you have a machine that meets the Requirements and Prerequisites mentioned above.

Next follow Conda installation instructions mentioned above. Once you have activated the environment, you will need to downgrade tornado to version 4.5.3 refer issue #680. Start Jupyter notebook, and navigate to the URL shown in the log output in your browser.

source activate h2o4gpuenv
conda install tornado==4.5.3
jupyter notebook --ip='*' --no-browser

Start a Python 3 kernel, and try the code in example notebooks

Using precompiled docker image

Requirements:

Download the Docker file (for linux_x86_64):

  • Bleeding edge (changes with every successful master branch build):

Load and run docker file (e.g. for bleeding-edge of cuda92):

jupyter notebook --generate-config
echo "c.NotebookApp.allow_remote_access = False >> ~/.jupyter/jupyter_notebook_config.py # Choose True if want to allow remote access
pbzip2 -dc h2o4gpu-0.3.0.10000-cuda92-runtime.tar.bz2 | nvidia-docker load
mkdir -p log ; nvidia-docker run --name localhost --rm -p 8888:8888 -u `id -u`:`id -g` -v `pwd`/log:/log -v /home/$USER/.jupyter:/jupyter --entrypoint=./run.sh opsh2oai/h2o4gpu-0.3.0.10000-cuda92-runtime &
find log -name jupyter* -type f -printf '%T@ %p\n' | sort -k1 -n | awk '{print $2}' | tail -1 | xargs cat | grep token | grep http | grep -v NotebookApp

Copy/paste the http link shown into your browser. If the "find" command doesn't work, look for the latest jupyter.log file and look at contents for the http link and token.

If the link shows no token or shows ... for token, try a token of "h2o" (without quotes). If running on your own host, the weblink will look like http://localhost:8888:token with token replaced by the actual token.

This container has a /demos directory which contains Jupyter notebooks and some data.

Plans

The vision is to develop fast GPU algorithms to complement the CPU algorithms in scikit-learn while keeping full scikit-learn API compatibility and scikit-learn CPU algorithm capability. The h2o4gpu Python module is to be used as a drop-in-replacement for scikit-learn that has the full functionality of scikit-learn's CPU algorithms.

Functions and classes will be gradually overridden by GPU-enabled algorithms (unless n_gpu=0 is set and we have no CPU algorithm except scikit-learn's). The CPU algorithms and code initially will be sklearn, but gradually those may be replaced by faster open-source codes like those in Intel DAAL.

This vision is currently accomplished by using the open-source scikit-learn and xgboost and overriding scikit-learn calls with our own GPU versions. In cases when our GPU class is currently incapable of an important scikit-learn feature, we revert to the scikit-learn class.

As noted above, there is an R API in development, which will be released as a stand-alone R package. All algorithms supported by H2O4GPU will be exposed in both Python and R in the future.

Another primary goal is to support all operations on the GPU via the GOAI initiative. This involves ensuring the GPU algorithms can take and return GPU pointers to data instead of going back to the host. In scikit-learn API language these are called fit_ptr, predict_ptr, transform_ptr, etc., where ptr stands for memory pointer.

RoadMap

2019 Q2:

  • A new processing engine that allows to scale beyond GPU memory limits
  • k-Nearest Neighbors
  • Matrix Factorization
  • Factorization Machines
  • API Support: GOAI API support
  • Data.table support

More precise information can be found in the milestone's list.

Solver Classes

Among others, the solver can be used for the following classes of problems

  • GLM: Lasso, Ridge Regression, Logistic Regression, Elastic Net Regulariation
  • KMeans
  • Gradient Boosting Machine (GBM) via XGBoost
  • Singular Value Decomposition(SVD) + Truncated Singular Value Decomposition
  • Principal Components Analysis(PCA)

Benchmarks

Our benchmarking plan is to clearly highlight when modeling benefits from the GPU (usually complex models) or does not (e.g. one-shot simple models dominated by data transfer).

We have benchmarked h2o4gpu, scikit-learn, and h2o-3 on a variety of solvers. Some benchmarks have been performed for a few selected cases that highlight the GPU capabilities (i.e. compute or on-GPU memory operations dominate data transfer to GPU from host):

Benchmarks for GLM, KMeans, and XGBoost for CPU vs. GPU.

A suite of benchmarks are computed when doing "make testperf" from a build directory. These take all of our tests and benchmarks h2o4gpu against h2o-3. These will soon be presented as a live commit-by-commit streaming plots on a website.

Contributing

Please refer to our CONTRIBUTING.md and DEVEL.md for instructions on how to build and test the project and how to contribute. The h2o4gpu Gitter chatroom can be used for discussion related to open source development.

GitHub issues are used for bugs, feature and enhancement discussion/tracking.

Questions

References

  1. Parameter Selection and Pre-Conditioning for a Graph Form Solver -- C. Fougner and S. Boyd
  2. Block Splitting for Distributed Optimization -- N. Parikh and S. Boyd
  3. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers -- S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein
  4. Proximal Algorithms -- N. Parikh and S. Boyd

Copyright

Copyright (c) 2017, H2O.ai, Inc., Mountain View, CA
Apache License Version 2.0 (see LICENSE file)


This software is based on original work under BSD-3 license by:

Copyright (c) 2015, Christopher Fougner, Stephen Boyd, Stanford University
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.
    * Neither the name of the <organization> nor the
      names of its contributors may be used to endorse or promote products
      derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

More Repositories

1

h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/
Python
10,513
star
2

h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Jupyter Notebook
6,658
star
3

wave

Realtime Web Apps and Dashboards for Python and R
Python
3,820
star
4

h2o-llmstudio

H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs. Documentation: https://h2oai.github.io/h2o-llmstudio/
Python
3,608
star
5

h2o-2

Please visit https://github.com/h2oai/h2o-3 for latest H2O
Java
2,222
star
6

datatable

A Python package for manipulating 2-dimensional tabular data structures
C++
1,790
star
7

h2o-tutorials

Tutorials and training material for the H2O Machine Learning Platform
Jupyter Notebook
1,457
star
8

sparkling-water

Sparkling Water provides H2O functionality inside Spark cluster
Scala
954
star
9

mli-resources

H2O.ai Machine Learning Interpretability Resources
Jupyter Notebook
478
star
10

h2o-meetups

Presentations from H2O meetups & conferences by the H2O.ai team
Jupyter Notebook
412
star
11

awesome-h2o

A curated list of research, applications and projects built using the H2O Machine Learning platform
353
star
12

db-benchmark

reproducible benchmark of database-like ops
R
299
star
13

deepwater

Deep Learning in H2O using Native GPU Backends
C++
285
star
14

pystacknet

Jupyter Notebook
284
star
15

h2o-wizardlm

Open-Source Implementation of WizardLM to turn documents into Q:A pairs for LLM fine-tuning
Python
242
star
16

driverlessai-recipes

Recipes for Driverless AI
Python
224
star
17

nitro

Create apps 10x quicker, without Javascript/HTML/CSS.
TypeScript
198
star
18

wave-apps

Sample AI Apps built with H2O Wave.
Python
139
star
19

h2o-flow

Web based interactive computing environment for H2O
CoffeeScript
131
star
20

tutorials

This is a repo for all the tutorials put out by H2O.ai. This includes learning paths for Driverless AI, H2O-3, Sparkling Water and more...
Jupyter Notebook
127
star
21

rsparkling

RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
R
64
star
22

steam

DEPRECATED Build, manage and deploy H2O's high-speed machine learning models.
Java
60
star
23

h2o-world-2014-training

training material
Java
47
star
24

h2o-sparkling

DEPRECATED! Use https://github.com/h2oai/sparkling-water repository! H2O and Spark interoperability based on Tachyon.
Scala
43
star
25

app-consumer-loan

HTML
41
star
26

h2o-kubeflow

Jsonnet
37
star
27

h2o-droplets

Templates for projects based on top of H2O.
Java
37
star
28

driverlessai-tutorials

H2OAI Driverless AI Code Samples and Tutorials
Jupyter Notebook
37
star
29

app-malicious-domains

Domain name classifier looking for good vs. possibly malicious providers
HTML
33
star
30

data-science-examples

A collection of data science examples implemented across a variety of languages and libraries.
CSS
33
star
31

xgboost-predictor

Java
32
star
32

wave-ml

Automatic Machine Learning (AutoML) for Wave Apps
Python
32
star
33

h2o-LLM-eval

Large-language Model Evaluation framework with Elo Leaderboard and A-B testing
Jupyter Notebook
28
star
34

Deep-Learning-with-h2o-in-R

Deep neural networks on over 50 classification problems from the UC Irvine Machine Learning Repository
R
23
star
35

h2o.js

Node.js bindings to H2O, the open-source prediction engine for big data science.
CoffeeScript
21
star
36

perf

Performance Benchmarks
Jupyter Notebook
21
star
37

typesentry

Python 2.7 & 3.5+ runtime type-checker
Python
20
star
38

covid19-datasets

20
star
39

h2o-kubernetes

H2O Open Source Kubernetes operator and a command-line tool to ease deployment (and undeployment) of H2O open-source machine learning platform H2O-3 to Kubernetes.
Rust
20
star
40

sql-sidekick

Experiment on QnA tabular data using LLMs and SQL
Python
18
star
41

AITD

Jupyter Notebook
17
star
42

dai-deployment-templates

Production ready templates for deploying Driverless AI (DAI) scorers. https://h2oai.github.io/dai-deployment-templates/
Java
17
star
43

qcon2015

Repository for SF QConf 2015 Workshop
Java
16
star
44

h2o3-sagemaker

Integrating H2O-3 AutoML with Amazon Sagemaker
Python
13
star
45

wave-image-styling-playground

A interactive playground to style and edit images, generate art and have fun.
Python
13
star
46

article-information-2019

Article for Special Edition of Information: Machine Learning with Python
Jupyter Notebook
13
star
47

genai-app-store-apps

GenAI apps from H2O made Wave
Python
12
star
48

social_ml

Python
12
star
49

challenge-wildfires

Starter kit for H2O.ai competition Challenge Wildfires.
Jupyter Notebook
11
star
50

h2o-jenkins-pipeline-lib

Library of different Jenkins pipeline building blocks.
Groovy
11
star
51

haic-tutorials

Jupyter Notebook
10
star
52

wave-h2o-automl

Wave App for H2O AutoML
Python
9
star
53

cvpr-multiearth-deforestation-segmentation

Jupyter Notebook
8
star
54

app-ask-craig

Ask Craig application
Scala
7
star
55

dai-deployment-examples

Examples for deploying Driverless AI (DAI) scorers.
Java
7
star
56

ml-security-audits

TeX
7
star
57

ht-catalog

Diverse collection of 100 Hydrogen Torch Use-Cases by different industries, data-types, and problem types
HTML
7
star
58

wave-big-data-visualizer

Python
6
star
59

xai_guidelines

Guidelines for the responsible use of explainable AI and machine learning
Jupyter Notebook
5
star
60

authn-py

Universal Token Provider
Python
5
star
61

fluid

Rapid application development for a more... civilized age.
CoffeeScript
5
star
62

h2o-scoring-service

Scoring service backend by model POJOs.
Java
5
star
63

app-news-classification

Scala
5
star
64

covid19-backtesting-publication

Jupyter Notebook
5
star
65

app-mojo-servlet

Example of putting a mojo zip file as a resource into a java servlet.
Java
5
star
66

cloud-discovery-py

H2O Cloud Discovery Client.
Python
4
star
67

jacocoHighlight

Java
4
star
68

h2o-automl-paper

H2O AutoML paper
R
4
star
69

docai-recipes

Jupyter Notebook
4
star
70

deepwater-nae

Python
3
star
71

h2oai-power-nae

Shell
3
star
72

nitro-matplotlib

Matplotlib plugin for H2O Nitro
Python
3
star
73

h2o-cloud

H2O Cloud code.
Jupyter Notebook
3
star
74

h2o-rf1-bench

Python
3
star
75

nitro-plotly

Plotly plugin for H2O Nitro
Python
3
star
76

residuals-vis

JavaScript
3
star
77

python-chat-ui

3
star
78

driverlessai-alt-containers

Shell
2
star
79

camelot

Modified version of https://github.com/camelot-dev/camelot
Python
2
star
80

nitro-bokeh

Bokeh plugin for H2O Nitro
Python
2
star
81

wave-amlb

Wave Dashboard for the OpenML AutoML Benchmark
Python
2
star
82

app-titanic

HTML
2
star
83

py-repo

Python package repository
HTML
2
star
84

roc-chart

JavaScript
2
star
85

h2o3-xgboost-nae

Shell
2
star
86

residuals-vis-example-project

JavaScript
2
star
87

wave-r-data-table

This wave application is a R data.table tutorial and interactive learning environment developed using the wave library for R.
R
2
star
88

h2o_genai_training

Repository for H2O.ai's Generative AI Training
Jupyter Notebook
2
star
89

dai-centos7-x86_64-nae

Dockerfile
1
star
90

correlation-graph

JavaScript
1
star
91

residuals-vis-data

JavaScript
1
star
92

pydart

Dart/Flutter <-> Python transpiler
Python
1
star
93

2017-06-21-hackathon

Meetup Hackathon 06/21/2017
HTML
1
star
94

h2o-health

An initiate of H2O.ai to build AI apps to solve complex healthcare and life science problems
Makefile
1
star
95

lightning

High performance, interactive statistical graphics engine for the web.
CoffeeScript
1
star
96

h2o-google-bigquery

Python
1
star
97

fiction

Yet another markdown-to-documentation generator
CoffeeScript
1
star
98

dallas-tutorials

Temporary repository for fast git cloning during the h2o dallas event.
Jupyter Notebook
1
star
99

pydata2016-h2o-loganalysis

Log Analysis Use Case for PyData2016
Java
1
star
100

aggregator-zoom

JavaScript
1
star