• Stars
    star
    3,111
  • Rank 13,795 (Top 0.3 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created almost 3 years ago
  • Updated 11 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.

Velox logo

Velox is a C++ database acceleration library which provides reusable, extensible, and high-performance data processing components. These components can be reused to build compute engines focused on different analytical workloads, including batch, interactive, stream processing, and AI/ML. Velox was created by Facebook and it is currently developed in partnership with Intel, ByteDance, and Ahana.

In common usage scenarios, Velox takes a fully optimized query plan as input and performs the described computation. Considering Velox does not provide a SQL parser, a dataframe layer, or a query optimizer, it is usually not meant to be used directly by end-users; rather, it is mostly used by developers integrating and optimizing their compute engines.

Velox provides the following high-level components:

  • Type: a generic typing system that supports scalar, complex, and nested types, such as structs, maps, arrays, tensors, etc.
  • Vector: an Arrow-compatible columnar memory layout module, which provides multiple encodings, such as Flat, Dictionary, Constant, Sequence/RLE, and Bias, in addition to a lazy materialization pattern and support for out-of-order writes.
  • Expression Eval: a fully vectorized expression evaluation engine that allows expressions to be efficiently executed on top of Vector/Arrow encoded data.
  • Function Packages: sets of vectorized function implementations following the Presto and Spark semantic.
  • Operators: implementation of common data processing operators such as scans, projection, filtering, groupBy, orderBy, shuffle, hash join, unnest, and more.
  • I/O: a generic connector interface that allows different file formats (ORC/DWRF and Parquet) and storage adapters (S3, HDFS, local files) to be used.
  • Network Serializers: an interface where different wire protocols can be implemented, used for network communication, supporting PrestoPage and Spark's UnsafeRow.
  • Resource Management: a collection of primitives for handling computational resources, such as memory arenas and buffer management, tasks, drivers, and thread pools for CPU and thread execution, spilling, and caching.

Velox is extensible and allows developers to define their own engine-specific specializations, including:

  1. Custom types
  2. Simple and vectorized functions
  3. Aggregate functions
  4. Operators
  5. File formats
  6. Storage adapters
  7. Network serializers

Examples

Examples of extensibility and integration with different component APIs can be found here

Documentation

Developer guides detailing many aspects of the library, in addition to the list of available functions can be found here.

Blog posts are available here.

Getting Started

We provide scripts to help developers setup and install Velox dependencies.

Get the Velox Source

git clone --recursive https://github.com/facebookincubator/velox.git
cd velox
# if you are updating an existing checkout
git submodule sync --recursive
git submodule update --init --recursive

Setting up on macOS

Once you have checked out Velox, on an Intel MacOS machine you can setup and then build like so:

$ ./scripts/setup-macos.sh 
$ make

On an M1 MacOS machine you can build like so:

$ CPU_TARGET="arm64" ./scripts/setup-macos.sh
$ CPU_TARGET="arm64" make

You can also produce intel binaries on an M1, use CPU_TARGET="sse" for the above.

Setting up on aarch64 Linux (Ubuntu 20.04 or later)

On an aarch64 based machine, you can build like so:

$ CPU_TARGET="aarch64" ./scripts/setup-ubuntu.sh
$ CPU_TARGET="aarch64" make

Setting up on x86_64 Linux (Ubuntu 20.04 or later)

Once you have checked out Velox, you can setup and build like so:

$ ./scripts/setup-ubuntu.sh 
$ make

Building Velox

Run make in the root directory to compile the sources. For development, use make debug to build a non-optimized debug version, or make release to build an optimized version. Use make unittest to build and run tests.

Note that,

  • Velox requires C++17 , thus minimum supported compiler is GCC 5.0 and Clang 5.0.
  • Velox requires the CPU to support instruction sets:
    • bmi
    • bmi2
    • f16c
  • Velox tries to use the following (or equivalent) instruction sets where available:
    • On Intel CPUs
      • avx
      • avx2
      • sse
    • On ARM
      • Neon
      • Neon64

Building Velox with docker-compose

If you don't want to install the system dependencies required to build Velox, you can also build and run tests for Velox on a docker container using docker-compose. Use the following commands:

$ docker-compose build ubuntu-cpp
$ docker-compose run --rm ubuntu-cpp

If you want to increase or decrease the number of threads used when building Velox you can override the NUM_THREADS environment variable by doing:

$ docker-compose run -e NUM_THREADS=<NUM_THREADS_TO_USE> --rm ubuntu-cpp

Contributing

Check our contributing guide to learn about how to contribute to the project.

Community

The main communication channel with the Velox OSS community is through the the Velox-OSS Slack workspace. Please reach out to [email protected] to get access to Velox Slack Channel.

License

Velox is licensed under the Apache 2.0 License. A copy of the license can be found here.

More Repositories

1

SocketRocket

A conforming Objective-C WebSocket client library.
Objective-C
9,524
star
2

katran

A high performance layer 4 load balancer
C
4,488
star
3

AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Python
4,418
star
4

cinder

Cinder is Meta's internal performance-oriented production version of CPython.
Python
3,349
star
5

spectrum

A client-side image transcoding library.
C++
1,985
star
6

FBX2glTF

A command-line tool for the conversion of 3D model assets on the FBX file format to the glTF file format.
C++
1,963
star
7

oomd

A userspace out-of-memory killer
C++
1,745
star
8

xar

executable archive format
Python
1,578
star
9

fastmod

A fast partial replacement for the codemod tool
Rust
1,570
star
10

Bowler

Safe code refactoring for modern Python.
Python
1,506
star
11

gloo

Collective communications library with various primitives for multi-machine training.
C++
1,128
star
12

fizz

C++14 implementation of the TLS-1.3 standard
C++
1,104
star
13

submitit

Python 3.8+ toolbox for submitting jobs to Slurm
Python
1,075
star
14

dhcplb

dhcplb is Facebook's implementation of a load balancer for DHCP.
Go
1,035
star
15

below

A time traveling resource monitor for modern Linux systems
Rust
975
star
16

OnlineSchemaChange

A tool for performing online schema changes on MySQL.
Python
951
star
17

Glean

System for collecting, deriving and working with facts about source code.
Hack
886
star
18

Battery-Metrics

Library that helps in instrumenting battery related system metrics.
Java
720
star
19

retrie

Retrie is a powerful, easy-to-use codemodding tool for Haskell.
Haskell
490
star
20

superconsole

The superconsole crate provides a handler and building blocks for powerful, yet minimally intrusive TUIs. It is cross platform, supporting Windows 7+, Linux, and MacOS. Rustaceans who want to create non-interactive TUIs can use the component composition building block system to quickly deploy their code.
Rust
447
star
21

nvdtools

A set of tools to work with the feeds (vulnerabilities, CPE dictionary etc.) distributed by National Vulnerability Database (NVD)
Go
428
star
22

infima

A UI framework that provides websites with the minimal CSS and JS needed to get started with building a modern responsive beautiful website
HTML
393
star
23

CG-SQL

CG/SQL is a compiler that converts a SQL Stored Procedure like language into C for SQLite. SQLite has no stored procedures of its own. CG/CQL can also generate other useful artifacts for testing and schema maintenance.
HTML
385
star
24

flowtorch

This library would form a permanent home for reusable components for deep probabilistic programming. The library would form and harness a community of users and contributors by focusing initially on complete infra and documentation for how to use and create components.
Jupyter Notebook
297
star
25

ptr

Python Test Runner.
Python
285
star
26

TTPForge

The TTPForge is a Cybersecurity Framework for developing, automating, and executing attacker Tactics, Techniques, and Procedures (TTPs).
Go
280
star
27

fbjni

A library designed to simplify the usage of the Java Native Interface
C++
245
star
28

senpai

Senpai is an automated memory sizing tool for container applications.
Python
213
star
29

gazebo

A Rust library containing a collection of small well-tested primitives.
Rust
210
star
30

dynolog

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
C++
161
star
31

reindeer

Reindeer is a tool to transform Rust Cargo dependencies into generated Buck build rules
Rust
157
star
32

FCR

FBNet-Command-Runner: A thrift service to run commands on heterogeneous Network devices with configurable parameters.
Python
154
star
33

GeoLift

GeoLift is an end-to-end geo-experimental methodology based on Synthetic Control Methods used to measure the true incremental effect (Lift) of ad campaign.
R
149
star
34

oculus-linux-kernel

The Linux kernel code for Oculus devices
C
148
star
35

hsthrift

The Haskell Thrift Compiler. This is an implementation of the Thrift spec that generates code in Haskell. It depends on the fbthrift project for the implementation of the underlying transport.
Haskell
143
star
36

dispenso

The project provides high-performance concurrency, enabling highly parallel computation.
C++
141
star
37

FioSynth

Tool which enables the creation of synthetic storage workloads, automates the execution and results collection of synthetic storage benchmarks.
Python
136
star
38

dataclassgenerate

DataClassGenerate (or simply DCG) is a Kotlin compiler plugin that addresses an Android APK size overhead from Kotlin data classes.
Kotlin
134
star
39

meta-code-verify

Code Verify is an open source web browser extension that confirms that your Facebook, Messenger, Instagram, and WhatsApp Web code hasnโ€™t been tampered with or altered, and that the Web experience youโ€™re getting is the same as everyone elseโ€™s.
TypeScript
133
star
40

go-qfext

a fast counting quotient filter implementation in golang
Go
88
star
41

tacquito

Tacquito is an open source TACACs+ server written in Go that implements RFC8907
Go
82
star
42

dcrpm

A tool to detect and correct common issues around RPM database corruption.
Python
72
star
43

ForgeArmory

ForgeArmory provides TTPs that can be used with the TTPForge (https://github.com/facebookincubator/ttpforge).
Swift
67
star
44

antlir

ANother Linux Image buildeR
Rust
63
star
45

ConversionsAPI-Tag-for-GoogleTagManager

This repository will contain the artifacts needed for setting up Conversions API implementation on Google Tag Manager's serverside. Please follow the instructions https://www.facebook.com/business/help/702509907046774
Smarty
63
star
46

InjKit

Injection Kit. It is a java bytecode processing library for bytecode injection and transformation.
Java
56
star
47

sks

Secure Key Storage (SKS) is a library for Go that abstracts Security Hardware on laptops.
Go
55
star
48

obs-plugins

OBS Plugins
C++
54
star
49

glTFVariantMeld

An application that accepts files on the glTF format, interprets them as variants of an over-arching whole, and melds them together.
Rust
47
star
50

later

A framework for python asyncio with batteries included for people writing services in python asyncio
Python
38
star
51

go2chef

A Golang tool to bootstrap a system from zero so that it's able to run Chef to be managed
Go
38
star
52

ConversionsAPI-Client-for-GoogleTagManager

This repository will contain the artifacts needed for setting up Conversions API implementation on Google Tag Manager's serverside. Primarily we will be hosting, - ConversionsAPI(Facebook) Client - listens on the events fired to GTM Server and maps them to common GTM schema. - ConversionsAPI(Facebook) Tag - server tag that fires events to CAPI.For more details on Design here https//fburl.com/uae68vlr
37
star
53

CommutingZones

Commuting zones are geographic areas where people live and work and are useful for understanding local economies, as well as how they differ from traditional boundaries. These zones are a set of boundary shapes built using aggregated estimates of home and work locations. Data used to build commuting zones is aggregated and de-identified.
JavaScript
37
star
54

Facebook-Pixel-for-Wordpress

A plugin for advertisers who use Wordpress to enable them easily setup the facebook pixel.
JavaScript
34
star
55

wordpress-messenger-customer-chat-plugin

Messenger Customer Chat Plugin for WordPress
PHP
26
star
56

CP4M

CP4M is a conversational marketing platform which enables advertisers to integrate their customer-facing chatbots with FB Messenger/WhatsApp, in order to meet customers where they are and drive native conversations on the advertiser's owned infra.
Java
26
star
57

rush

RUSH (Reliable - unreliable - Streaming Protocol)
C++
22
star
58

buck2-change-detector

Given a Buck2 built project and a set of changes (e.g. from source control) compute the targets that may have changed. Sometimes known as a target determinator, useful for optimizing a CI system.
Rust
18
star
59

MY_ENUM

Small c++ macro library to add compile-time introspection to c++ enum classes.
C++
15
star
60

spark-ar-core-libs

Core libraries that can be used in Spark AR. You can import each library depends on your requirements.
TypeScript
15
star
61

SafeC

Library containing safer alternatives/wrappers for insecure C APIs.
C++
14
star
62

go-belt

It is an implementation-agnostic Go(lang) package to generalize observability tooling (logger, metrics, tracer and so on) and provide ability to use any of these tools with a standard context. Essentially it is an attempt to standardize observability API in Go.
Go
14
star
63

Portal-Kernel

Kernel Code for Portal.
C
11
star
64

sado

A macOS signed-app shim for running daemons with reliable capabilities.
Swift
10
star
65

npe-toolkit

Libraries, guides, blueprints, and sample code, to enable rapidly building 0-1 applications on iOS, Android and web.
TypeScript
9
star
66

Eigen-FBPlugins

This is collection of plugins extending Eigen arrays/matrices with main focus on using them for computer vision. In particular, this project should provide support for multichannel arrays (missing in vanilla Eigen) and seamless integration between Eigen types and OpenCV functions.
C++
8
star
67

isometric_pattern_matcher

A new isometric calibration pattern - which should/might lead to higher accuracy calibrations compared to existing solutions (checkerboards, patterns of circles).
C++
8
star
68

dnf-plugin-cow

Code to enable Copy on Write features being upstreamed in rpm and librepo
Shell
8
star
69

wireguard_py

Cython library for Wireguard
C
6
star
70

strobelight

Meta's fleetwide profiler framework
6
star
71

jupyterhub_fb_authenticator

JupyterHub Facebook Authenticator is a Facebook OAuth authenticator built on top of OAuthenticator.
Python
5
star
72

meta-fbvuln

OpenEmbedded meta-layer that allows producing a vulnerability manifest alongside a Yocto build. The produced manifest is suitable for ongoing vulnerability scanning of fielded software.
5
star
73

gazebo_lint

A Rust linter that provides various suggestions based on the new primitives offered in the `gazebo` library.
Rust
4
star
74

kernel-patches-daemon

Sync Patchwork series's with Github pull requests
Python
4
star
75

scrut

Scrut is a testing toolkit for CLI applications. A tool to scrutinize terminal programs without fuss.
Rust
4
star
76

language-capirca

Adds syntax highlighting for Capirca filetypes in Atom. Capirca is an open source standard for writing vendor-neutral firewall policies as originally released by Google: https://github.com/google/capirca
3
star
77

fbc_owrt_feed

Facebook Connectivity OpenWrt Feed. Package feed for OpenWrt router OS by Facebook Connectivity programme.
Lua
2
star
78

cutlass-fork

A Meta fork of NV CUTLASS repo.
C++
2
star