• Stars
    star
    525
  • Rank 80,964 (Top 2 %)
  • Language
    Scala
  • License
    Apache License 2.0
  • Created about 9 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Real Time Analytics and Data Pipelines based on Spark Streaming

Discontinued

After around two years of development, we have decided to discontinue this project due to a major refactor in its structure and in a near future we will launch Sparta 2.0.

We would like to thank all the open source community for their contribution. Needless to say that you can continue using this repository as a basis for your developments as it contains the latest stable version as of today and minor issues will be attended.

If you are interested in the new Sparta 2.0 with pipelines and workflows, please contact with us in the email [email protected]

About Stratio Sparta

At Stratio, we have implemented several real-time analytics projects based on Apache Spark, Kafka, Flume, Cassandra, ElasticSearch or MongoDB. These technologies were always a perfect fit, but soon we found ourselves writing the same pieces of integration code over and over again. Stratio Sparta is the easiest way to make use of the Apache Spark Streaming technology and all its ecosystem. Choose your input, operations and outputs, and start extracting insights out of your data in real-time.

Strata Twitter Analytics with Kibana

Main Features

  • Pure Spark
  • No need of coding, only declarative analytical workflows
  • Data continuously streamed in & processed in near real-time
  • Ready to use out-of-the-box
  • Plug & play: flexible workflows (inputs, outputs, transformations, etc…)
  • High performance and Fault Tolerance
  • Scalable and High Availability
  • Big Data OLAP on real-time to small data
  • ETLs
  • Triggers over streaming data
  • Spark SQL language with streaming and batch data
  • Kerberos and CAS compatible

Main Features

Architecture

Send one workflow as a JSON to Sparta API and execute in one Spark Cluster your own real-time plugins Architecture

Sparta as a Job Manager

Send more than one Streaming Job in the Spark Cluster and manage them with a simple UI

Job Manager

Run workflows over Mesos, Yarn or SparkStandAlone

Job Manager Architecture

Sparta as a SDK

Modular components extensible with simple SDK

  • You can extend several points of the platform to fulfill your needs, such as adding new inputs, outputs, operators, transformations.
  • Add new functions to Kite SDK in order to extend the data cleaning, enrichment and normalization capabilities. Architecture Detail

Components

On each workflow multiple components can be defined, but now all have the following architecture workflow Components

Core components

Several plugins are been implemented by Stratio Sparta team Main plugins

Trigger component

With Sparta is possible to execute queries over the streaming data, execute ETL, aggregations and Simple Event Processing mixing streaming data with batch data on the trigger process. triggers

Aggregation component

The aggregation process in Sparta is very powerful because is possible to generate efficient OLAP processes with streaming data OLAP

Advanced feature are been implemented in order to optimize the stateful operations over Spark Streaming Aggregations

Inputs

  • Twitter
  • Kafka
  • Flume
  • RabbitMQ
  • Socket
  • WebSocket
  • HDFS/S3

Outputs

  • MongoDB
  • Cassandra
  • ElasticSearch
  • Redis
  • JDBC
  • CSV
  • Parquet
  • Http
  • Kafka
  • HDFS/S3
  • Http Rest
  • Avro
  • Logger

Outputs

Key technologies

Advantages

Sparta provide several advantages to final Users Advantages

Build

You can generate rpm and deb packages by running:

mvn clean package -Ppackage

Note: you need to have installed the following programs in order to build these packages:

In a debian distribution:

  • fakeroot
  • dpkg-dev
  • rpm
  • jq

In a centOS distribution:

  • fakeroot
  • dpkg-dev
  • rpmdevtools
  • jq

In all distributions:

  • Java 8
  • Maven 3

License

Licensed to STRATIO (C) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The STRATIO (C) licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

More Repositories

1

cassandra-lucene-index

Lucene based secondary indexes for Cassandra
Java
597
star
2

Decision

Powered by Spark Streaming & Siddhi
Java
315
star
3

Spark-MongoDB

Spark library for easy MongoDB access
Scala
306
star
4

stratio-cassandra

Discontinued in favour of Cassandra Lucene Index
Java
204
star
5

spark-rabbitmq

RabbitMQ Spark Streaming receiver
Scala
201
star
6

deep-spark

Connecting Apache Spark with different data stores [DEPRECATED]
Java
196
star
7

crossdata

DISCONTINUED - Easy access to big things. Library for Apache Spark extending and improving its capabilities
Scala
169
star
8

ingestion

Flume - Ingestion, an Apache Flume distribution
Java
147
star
9

khermes

A distributed fake data generator based in Akka.
Scala
92
star
10

stratio-connector-mongodb

(DEPRECATED) A crossdata connector to MongoDB
Java
77
star
11

stratio-connector-decision

(DEPRECATED) A connector for stratio streaming
Java
73
star
12

stratio-connector-elasticsearch

(DEPRECATED) noverify
Java
72
star
13

stratio-connector-cassandra

(DEPRECATED) Native connector for Cassandra using Crossdata
Java
72
star
14

stratio-connector-commons

(DEPRECATED) The common module for the stratio connectors
Java
72
star
15

stratio-connector-deep

(DEPRECATED) Deep connector for multiple data sources
Java
70
star
16

stratio-connector-sparkSQL

(DEPRECATED) A crossdata connector to Spark SQL
Scala
67
star
17

stratio-connector-hdfs

(DEPRECATED) HDFS
Scala
66
star
18

crossdata-connector-skeleton

(DEPRECATED) Skeleton project that can be used to implement Crossdata connectors
Java
62
star
19

vagrant-ova-plugin

Vagrant plugin that export a box from vbox to vmwware
Ruby
61
star
20

datasource-receiver

Spark Receiver for SQL or NoSQL Databases like Cassandra, MongoDB, Elasticsearch or JDBC
Scala
42
star
21

egeo-starter

Egeo Starter is a Boilerplate project prepared for work with Egeo 1.x, Angular 2.x, TypeScript, Webpack, Karma, Jasmine and Sass.
TypeScript
40
star
22

kafka-elasticsearch-sink

Java
31
star
23

incubator-toree

Scala
30
star
24

valkiria

Go
29
star
25

rocket-examples

Sparta 2.x examples: workflows, plugins, sdk, docker ...
Scala
16
star