• Stars
    star
    182
  • Rank 211,154 (Top 5 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 8 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Cache File System optimized for columnar formats and object stores

RubiX

[Build Status] codecov

RubiX is a light-weight data caching framework that can be used by Big-Data engines. RubiX uses local disks to provide the best I/O bandwidth to the Big Data Engines. RubiX is useful in shared storage architectures where the data execution engine is separate from storage. For example, on public clouds like AWS or Microsoft Azure, data is stored in cloud store and the engine accesses the data over a network. Similarly in data centers Presto runs on a separate cluster from HDFS and accesses data over the network.

RubiX can be extended to support any engine that accesses data using Hadoop FileSystem interface via plugins. There are plugins to access data on AWS S3, Microsoft Azure Blob Store and HDFS. RubiX can be extended to be used with any other storage systems including other cloud stores

Check the User and Developer manual for more more information on getting started.

Supported Engines and Cloud Stores

  • Presto: Amazon S3
  • Spark: Amazon S3
  • Any engine using hadoop-2 e.g. Hive can utilize RubiX. Amazon S3 is supported

Resources

Documentation
Getting Started Guide
User Group (Google)

Talks

Talk on Rubix at Strata 2017

Blog Posts

Developers

Slack Channel

The channel is restricted to a few domains. Send an email on the user group or contact us through Github issues. We will add you to the slack channel.

More Repositories

1

sparklens

Qubole Sparklens tool for performance tuning Apache Spark
Scala
562
star
2

spark-on-lambda

Apache Spark on AWS Lambda
Scala
151
star
3

kinesis-sql

Kinesis Connector for Structured Streaming
Scala
137
star
4

afctl

afctl helps to manage and deploy Apache Airflow projects faster and smoother.
Python
130
star
5

presto-udfs

Plugin for Presto to allow addition of user functions easily
Java
115
star
6

quark

Quark is a data virtualization engine over analytic databases.
Java
98
star
7

streamx

kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Java
97
star
8

spark-acid

ACID Data Source for Apache Spark based on Hive ACID
Scala
96
star
9

qds-sdk-py

Python SDK for accessing Qubole Data Service
Python
51
star
10

uchit

Python
29
star
11

streaminglens

Qubole Streaminglens tool for tuning Spark Structured Streaming Pipelines
Scala
17
star
12

s3-sqs-connector

A library for reading data from Amzon S3 with optimised listing using Amazon SQS using Spark SQL Streaming ( or Structured streaming).
Scala
17
star
13

spark-state-store

Rocksdb state storage implementation for Structured Streaming.
Scala
16
star
14

presto-kinesis

Presto connector to Amazon Kinesis service.
Java
14
star
15

kinesis-storage-handler

Hive Storage Handler for Kinesis.
Java
11
star
16

qds-sdk-java

A Java library that provides the tools you need to authenticate with, and use the Qubole Data Service API.
Java
7
star
17

demotrends

Code required to setup the demo trends website (http://demotrends.qubole.com)
Ruby
6
star
18

qubole-terraform

HCL
6
star
19

space-ui

UI Ember components based on Space design specs
JavaScript
5
star
20

caching-metastore-client

A metastore client that caches objects
Java
5
star
21

rubix-admin

Admin scripts for Rubix
Python
5
star
22

tco

Python
4
star
23

qds-sdk-R

R extension to execute Hive Commands through Qubole Data Service Python SDK.
Python
4
star
24

docker-images

Qubole Docker Images
Dockerfile
4
star
25

tableau-qubole-connector

JavaScript
3
star
26

metriks-addons

Utilities for collecting metrics in a Rails Application
Ruby
3
star
27

qds-sdk-ruby

Ruby SDK for Qubole API
Ruby
3
star
28

qubole-log-datasets

3
star
29

hubot-qubole

Interaction with Qubole Data Services APIs via Hubot framework
CoffeeScript
3
star
30

customer-success

HCL
2
star
31

bootstrap-functions

Useful functions for Qubole cluster bootstraps
Shell
2
star
32

qubole-jar-test

A maven project to test that qubole jars can be listed as dependencies
Java
2
star
33

etl-examples

Scala
2
star
34

perf-kit-queries

2
star
35

tuning-paper

TeX
2
star
36

blogs

1
star
37

jupyter

1
star
38

presto-event-listeners

1
star
39

qubole-rstudio-example

1
star
40

presto

Presto
Java
1
star
41

qubole.github.io

Qubole OSS Page
1
star
42

quboletsdb

Setup opentsdb using Qubole
Python
1
star