• This repository has been archived on 21/Dec/2018
  • Stars
    star
    138
  • Rank 264,508 (Top 6 %)
  • Language Cuda
  • License
    Apache License 2.0
  • Created over 7 years ago
  • Updated about 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ARCHIVED] C GPU DataFrame Library

⚠️ [ARCHIVED] libgdf: GPU Dataframes

All development has moved to the cuDF repo effective October 28th 2018

The contents of this repo and the README have been archived for reference. Future development for libgdf will take place in the /libgdf folder of the cuDF repo.

Outstanding PRs

With the refactoring of moving all files to the folder /libgdf in this repo, updating to master should reduce the merge conflicts when trying to merge with master on cuDF. The entire commit history of libgdf has been merged into cuDF to also assist in this transition.

Outstanding Issues

All issues will be copied and migrated to the cuDF repo.


Build Status

libgdf is a C library for implementing common functionality for a GPU Data Frame. For more project details, see the wiki.

Development Setup

The following instructions are tested on Linux and OSX systems.

Compiler requirement:

  • g++ 4.8 or 5.4
  • cmake 3.12+

CUDA requirement:

  • CUDA 9.0+

You can obtain CUDA from https://developer.nvidia.com/cuda-downloads.

Get dependencies

Note: This repo uses submodules. Make sure you cloned recursively:

git clone --recurse-submodules [email protected]:gpuopenanalytics/libgdf.git

Or, after cloning:

cd libgdf
git submodule update --init --recursive

Since cmake will download and build Apache Arrow (version 0.7.1 or 0.8+) you may need to install Boost C++ (version 1.58) before running cmake:

# Install Boost C++ 1.58 for Ubuntu 16.04
$ sudo apt-get install libboost-all-dev

or

# Install Boost C++ 1.58 for Conda (you will need a Python 3.3 environment)
$ conda install -c omnia boost=1.58.0=py33_0

Libgdf supports Apache Arrow versions 0.7.1 and 0.8+ (0.10.0 is default) that use different metadata versions in IPC. So, it is important to specify which Apache arrow version will be used during building libgdf. To select required Apache Arrow version, define the following environment variables (using Arrow version 0.10.0 as an example):

$ export ARROW_VERSION=0.10.0
$ export PARQUET_ARROW_VERSION=apache-arrow-$ARROW_VERSION

where the latter is used by libgdf cmake configuration files. Note that when using libgdf, defining the above environment variables is not necessary.

You can install Boost C++ 1.58 from sources as well: https://www.boost.org/doc/libs/1_58_0/more/getting_started/unix-variants.html

To run the python tests it is recommended to setup a conda environment for the dependencies.

# create the conda environment (assuming in build directory)
$ conda env create --name libgdf_dev --file ../conda_environments/dev_py35.yml
# activate the environment
$ source activate libgdf_dev
# when not using default arrow version 0.10.0, run
$ conda install pyarrow=$ARROW_VERSION -c conda-forge

This installs the required cmake and pyarrow into the libgdf_dev conda environment and activates it.

For additional information, the python cffi wrapper code requires cffi and pytest. The testing code requires numba and cudatoolkit as an additional dependency. All these are installed from the previous commands.

The environment can be updated from ../conda_environments/dev_py35.yml as development includes/changes the depedencies. To do so, run:

conda env update --name libgdf_dev --file ../conda_environments/dev_py35.yml

Note that dev_py35.yml uses the latest version of pyarrow. Reinstall pyarrow if needed using conda install pyarrow=$ARROW_VERSION -c conda-forge.

Configure and build

This project uses cmake for building the C/C++ library. To configure cmake, run:

$ mkdir build   # create build directory for out-of-source build
$ cd build      # enter the build directory
$ cmake ..      # configure cmake (will download and build Apache Arrow and Google Test)

If installing libgdf to conda environment is desired, then replace the last command with

$ cmake -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX ..

To build the C/C++ code, run make. This should produce a shared library named libgdf.so or libgdf.dylib.

If you run into compile errors about missing header files:

cub/device/device_segmented_radix_sort.cuh: No such file or directory

See the note about submodules in the Get dependencies section above.

Link python files into the build directory

To make development and testing more seamless, the python files and tests can be symlinked into the build directory by running make copy_python. With that, any changes to the python files are reflected in the build directory. To rebuild the libgdf, run make again.

Run tests

Currently, all tests are written in python with py.test. A make target is available to trigger the test execution. In the build directory (and with the conda environment activated), run below to exceute test:

$ make pytest   # this auto trigger target "copy_python"

More Repositories

1

cudf

cuDF - GPU DataFrame Library
C++
8,319
star
2

cuml

cuML - RAPIDS Machine Learning Library
C++
3,864
star
3

cugraph

cuGraph - RAPIDS Graph Analytics Library
Cuda
1,668
star
4

cusignal

cuSignal - RAPIDS Signal Processing Library
Python
703
star
5

raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
Cuda
586
star
6

jupyterlab-nvdashboard

A JupyterLab extension for displaying dashboards of GPU usage.
TypeScript
582
star
7

notebooks

RAPIDS Sample Notebooks
Shell
577
star
8

cuspatial

CUDA-accelerated GIS and spatiotemporal algorithms
Jupyter Notebook
543
star
9

rmm

RAPIDS Memory Manager
C++
420
star
10

deeplearning

Jupyter Notebook
336
star
11

cucim

cuCIM - RAPIDS GPU-accelerated image processing library
Jupyter Notebook
333
star
12

dask-cuda

Utilities for Dask and CUDA interactions
Python
266
star
13

cuxfilter

GPU accelerated cross filtering with cuDF.
Python
261
star
14

node

GPU-accelerated data science and visualization in node
TypeScript
170
star
15

clx

A collection of RAPIDS examples for security analysts, data scientists, and engineers to quickly get started applying RAPIDS and GPU acceleration to real-world cybersecurity use cases.
Jupyter Notebook
167
star
16

dask-cudf

[ARCHIVED] Dask support for distributed GDF object --> Moved to cudf
Python
135
star
17

cloud-ml-examples

A collection of Machine Learning examples to get started with deploying RAPIDS in the Cloud
Jupyter Notebook
134
star
18

ucx-py

Python bindings for UCX
Python
118
star
19

gpu-bdb

RAPIDS GPU-BDB
Python
103
star
20

kvikio

KvikIO - High Performance File IO
Python
100
star
21

plotly-dash-rapids-census-demo

Jupyter Notebook
92
star
22

gputreeshap

C++
83
star
23

frigate

Frigate is a tool for automatically generating documentation for your Helm charts
Python
76
star
24

wholegraph

WholeGraph - large scale Graph Neural Networks
Cuda
75
star
25

spark-examples

[ARCHIVED] Moved to github.com/NVIDIA/spark-xgboost-examples
Jupyter Notebook
70
star
26

docker

Dockerfile templates for creating RAPIDS Docker Images
Shell
69
star
27

cuvs

cuVS - a library for vector search and clustering on the GPU
Jupyter Notebook
57
star
28

custrings

[ARCHIVED] GPU String Manipulation --> Moved to cudf
Cuda
46
star
29

docs

RAPIDS Documentation Site
HTML
34
star
30

cudf-alpha

[ARCHIVED] cuDF [alpha] - RAPIDS Merge of GoAi into cuDF
34
star
31

rapids-examples

Jupyter Notebook
31
star
32

nvgraph

C++
26
star
33

rapids-cmake

CMake
24
star
34

cuhornet

Cuda
24
star
35

cuDataShader

Jupyter Notebook
22
star
36

gpuci-build-environment

Common build environment used by gpuCI for building RAPIDS
Dockerfile
19
star
37

distributed-join

C++
19
star
38

devcontainers

Shell
18
star
39

dask-cuml

[ARCHIVED] Dask support for multi-GPU machine learning algorithms --> Moved to cuml
Python
16
star
40

integration

RAPIDS - combined conda package & integration tests for all of RAPIDS libraries
Shell
15
star
41

xgboost-conda

Conda recipes for xgboost
Jupyter Notebook
12
star
42

benchmark

Python
11
star
43

ucxx

C++
11
star
44

dependency-file-generator

Python
10
star
45

asvdb

Python
9
star
46

helm-chart

Shell
9
star
47

deployment

RAPIDS Deployment Documentation
Jupyter Notebook
9
star
48

miniforge-cuda

Dockerfile
9
star
49

ci-imgs

Dockerfile
7
star
50

dask-cugraph

Python
7
star
51

rapids.ai

rapids.ai web site
HTML
7
star
52

ptxcompiler

Python
6
star
53

GaaS

Python
5
star
54

rvc

Go
4
star
55

scikit-learn-nv

Python
4
star
56

ops-bot

A Probot application used by the Ops team for automation.
TypeScript
4
star
57

workflows

Shell
4
star
58

rapids-triton

C++
4
star
59

dask-build-environment

Build environments for various dask related projects on gpuCI
Dockerfile
3
star
60

roc

GitHub utilities for the RAPIDS Ops team
Go
3
star
61

multi-gpu-tools

Shell
3
star
62

detect-weak-linking

Python
3
star
63

dask-cuda-benchmarks

Python
2
star
64

shared-workflows

Reusable GitHub Actions workflows for RAPIDS CI
Shell
2
star
65

rapids_triton_pca_example

C++
2
star
66

cugunrock

Cuda
2
star
67

dgl-cugraph-build-environment

Dockerfile
2
star
68

projects

Jupyter Notebook
2
star
69

crossfit

Metric calculation library
Python
2
star
70

gpuci-mgmt

Mangement scripts for gpuCI
Shell
1
star
71

ansible-roles

1
star
72

code-share

C++
1
star
73

build-metrics-reporter

Python
1
star
74

cibuildwheel-imgs

Dockerfile
1
star
75

gpuci-tools

User tools for use within the gpuCI environment
Shell
1
star
76

pynvjitlink

Python
1
star
77

rapids-dask-dependency

Shell
1
star
78

sphinx-theme

This repository contains a Sphinx theme used for RAPIDS documentation
CSS
1
star