• This repository has been archived on 20/Nov/2019
  • Stars
    star
    169
  • Rank 216,351 (Top 5 %)
  • Language
    Scala
  • License
    Apache License 2.0
  • Created about 10 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

DISCONTINUED - Easy access to big things. Library for Apache Spark extending and improving its capabilities

Discontinued

This repository has been discontinued. Stratio Crossdata has moved to a commercial license. Please contact Stratio Big Data Inc. for further info.

Introduction

Project unmaintained GitterIL Coverage Status

Crossdata is a distributed framework and a fast and general-purpose computing system powered by Apache Spark. It unifies the interaction with different sources supporting multiple datastore technologies thanks to its generic architecture and a custom SQL-like language using SparkSQL as the core of the project. In addition, Crossdata supports batch and streaming processing so that you can mix data from both input technologies. Supporting multiple architectures imposes two main challenges: how to normalize the access to the datastores, and how to cope with datastore limitations. Crossdata provides connectors which can access to multiple datastores natively, speeding up the queries by avoiding the overhead and the block of resources of the Spark Cluster when possible. We offer a shell, Java and Scala APIs, JDBC and ODBC for BI tools.

This project is aimed for those who want to manage only one API to access to multiple datastores with different nature, get rid of the drawbacks of Apache Spark, perform analytics from a BI tool and speed up your queries effortlessly.

Crossdata is broken up into the following components:

  • Crossdata Core: It is a library that you can deploy in any existent system using Spark with no changes, just add the Crossdata jar file. SparkSQL extension with improvements in the DataSource API and new features. Crossdata expands the functionalities of Apache Spark in order to provide a richer SQL-like language, to improve some aspects (metastore, execution trees, ...)
  • Crossdata Server: Provides a multi-user environment to SparkSQL, giving a reliable architecture with high-availability and scalability out of the box.
  • Crossdata Driver: Entry point with an API for both Scala and Java. Crossdata ODBC/JDBC uses this driver. - Crossdata Connectors: Take advantage of the Crossdata DataSource API to speed up the queries in specific datasources and provide new features.

We include some Spark connectors optimized to access to each datasource, but Crossdata is fully compatible with any connector developed by the Spark community.

  • Apache Cassandra connector powered by Datastax-Spark-Connector
  • MongoDB connector powered by Stratio-Spark-Connector
  • ElasticSearch connector powered by Elastic-Spark-Connector

Moreover, some datasources are already included, avoiding to import them manually:

  • Spark-CSV
  • Spark-Avro

Main Crossdata's advantages over other options:

  • JDBC/ODBC self-contained. Other solutions require Hive.
  • Faster queries using native access (including subdocuments and array elements).
  • Streaming queries from a SQL-like interface.
  • Metadata discovery.
  • Datasource functions (Spark only can execute its own UDFs).
  • High-availability and load balancing.
  • Logical views.
  • Full SQL interface for documents with nested subdocuments and nested arrays.
  • Persistent metadata catalog.
  • Common interface for datasources management.
  • Creation of tables in the datastores.
  • Drop of tables from the datastores.
  • Insert with values queries as in typical SQL.
  • Service Discovery

=================== Spark Compatibility

Crossdata Version Spark Version
1.7.X 1.6.X
1.6.X 1.6.X
1.5.X 1.6.X
1.4.X 1.6.X
1.3.X 1.6.X
1.2.X 1.5.X
1.1.X 1.5.X
1.0.X 1.5.X

=========== Get support

You can send us issues in https://crossdata.atlassian.net.

You can also find help in https://groups.google.com/forum/#!forum/crossdata-users.

There is also a gitter channel available: https://gitter.im/Stratio/Crossdata.

Alternatively, you can try to reach us at gitter or our IRC channel #stratio-crossdata. Feel free to ask, if we are available we'll try to help you.

============= Release notes

Features and changes are detailed in the changelog.

======= License

Stratio Crossdata is licensed as Apache2

Licensed to STRATIO (C) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The STRATIO (C) licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

More Repositories

1

cassandra-lucene-index

Lucene based secondary indexes for Cassandra
Java
597
star
2

sparta

Real Time Analytics and Data Pipelines based on Spark Streaming
Scala
525
star
3

Decision

Powered by Spark Streaming & Siddhi
Java
315
star
4

Spark-MongoDB

Spark library for easy MongoDB access
Scala
306
star
5

stratio-cassandra

Discontinued in favour of Cassandra Lucene Index
Java
204
star
6

spark-rabbitmq

RabbitMQ Spark Streaming receiver
Scala
201
star
7

deep-spark

Connecting Apache Spark with different data stores [DEPRECATED]
Java
196
star
8

ingestion

Flume - Ingestion, an Apache Flume distribution
Java
147
star
9

khermes

A distributed fake data generator based in Akka.
Scala
92
star
10

stratio-connector-mongodb

(DEPRECATED) A crossdata connector to MongoDB
Java
77
star
11

stratio-connector-decision

(DEPRECATED) A connector for stratio streaming
Java
73
star
12

stratio-connector-elasticsearch

(DEPRECATED) noverify
Java
72
star
13

stratio-connector-cassandra

(DEPRECATED) Native connector for Cassandra using Crossdata
Java
72
star
14

stratio-connector-commons

(DEPRECATED) The common module for the stratio connectors
Java
72
star
15

stratio-connector-deep

(DEPRECATED) Deep connector for multiple data sources
Java
70
star
16

stratio-connector-sparkSQL

(DEPRECATED) A crossdata connector to Spark SQL
Scala
67
star
17

stratio-connector-hdfs

(DEPRECATED) HDFS
Scala
66
star
18

crossdata-connector-skeleton

(DEPRECATED) Skeleton project that can be used to implement Crossdata connectors
Java
62
star
19

vagrant-ova-plugin

Vagrant plugin that export a box from vbox to vmwware
Ruby
61
star
20

datasource-receiver

Spark Receiver for SQL or NoSQL Databases like Cassandra, MongoDB, Elasticsearch or JDBC
Scala
42
star
21

egeo-starter

Egeo Starter is a Boilerplate project prepared for work with Egeo 1.x, Angular 2.x, TypeScript, Webpack, Karma, Jasmine and Sass.
TypeScript
40
star
22

kafka-elasticsearch-sink

Java
31
star
23

incubator-toree

Scala
30
star
24

valkiria

Go
29
star
25

mesos-universe

The Mesosphere Universe package repository.
HTML
29
star
26

sparkstream_ioft

Code used for "Spark Stream for the Internet of [Flying] Things" Meetup 2016
Scala
29
star
27

marathon-lb-sec

Marathon-lb is a service discovery & load balancing tool for DC/OS
Python
23
star
28

rocket-examples

Sparta 2.x examples: workflows, plugins, sdk, docker ...
Scala
16
star
29

etcd4j

Java / Netty client for etcd, the highly-available key value store for shared configuration and service discovery.
Java
1
star