• This repository has been archived on 27/May/2020
  • Stars
    star
    147
  • Rank 251,347 (Top 5 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 10 years ago
  • Updated almost 8 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Flume - Ingestion, an Apache Flume distribution

Coverage Status

Stratio Ingestion

Contents

  • Introduction
  • Stratio Ingestion components
  • Details about Stratio Ingestion
  • Compile & Package
  • FAQ

Introduction

Stratio Ingestion started as a fork of Apache Flume (1.6), where you can find:

Custom sources and sinks, developed by Stratio

  • SNMP (v1, v2c and 3)
  • redis, Kafka (0.8.1.1)
  • MongoDB, JDBC, Cassandra and Druid
  • Stratio Decision (Complex Event Processing engine)
  • REST client, Flume agents stats

Several bug fixes

  • Some of them really important, such as unicode support

Several enhancements of Flume's sources & sinks

  • ElasticSearch mapper, for example

You can find more documentation about us here

Stratio Ingestion components

  • Data transporter and collector: Apache Flume
  • Data extractor and transformer: Morphlines
  • Custom sources types to read data from:
    • REST com.stratio.ingestion.source.rest.RestSource
    • Redis FlumeStats com.stratio.ingestion.source.redis.RedisSource
    • SNMPTraps com.stratio.ingestion.source.snmptraps.SNMPSource
    • IRC com.stratio.ingestion.source.irc.IRCSource
  • Custom sinks types to write the data to:
    • Cassandra com.stratio.ingestion.sink.cassandra.CassandraSink
    • MongoDB com.stratio.ingestion.sink.mongodb.MongoSink
    • Stratio Decision
    • JDBC com.stratio.ingestion.sink.jdbc.JDBCsink
    • Kafka com.stratio.ingestion.sink.kafka.KafkaSink
    • Druid com.stratio.ingestion.sink.druid.DruidSink

Details about Stratio Ingestion

Stratio Ingestion is based on Apache Flume so the first question is:

What is Apache Flume?

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store.

Its use is not only designed for logs, in fact you can find a myriad of sources, sinks and transformations.

In addition, a sink could be a big data storage but also another real-time system (Apache Kafka, Spark Streaming).

Interesting facts about Stratio Ingestion

  • Flume Ingestion is Apache Flume "on steroids" :)

  • We are extensively using Kite SDK (morphlines) in order to do a better T from ETL, and so we have also developed a bunch of custom transformations.

  • Stratio ingestion is fully open source and we work very close to the Flume community.

Compile & Package

$ mvn clean compile package -Ppackage

Distribution will be available at stratio-ingestion-dist/target/ folder. You will find .deb, .rpm and .tar.gz packages ready to use depending your environment. If you take a look at documentation you will find more details about how to install the product, and some useful examples to get a better understanding about Stratio Ingestion.

FAQ

Can I use Stratio Ingestion for aggregating data (time-based rollups, for example)?

*This is not a good idea from our experience, but you can use Stratio Sparkta for real-time aggregation.

Is Flume Ingestion multipersistence?

Yes, you can write data to JDBC sources, mongoDB, Apache Cassandra, ElasticSearch, Apache Kafka, among others.

Can I send data to decision-cep-engine?

Of course, we have developed a sink in order to send events from Flume to an existing stream in our CEP engine. The sink will create the stream if it does not exist in the engine.

Where can I find more details about Stratio Ingestion?

*You can take a look at our Documentation on Confluence

Changelog

See the changelog for changes.

More Repositories

1

cassandra-lucene-index

Lucene based secondary indexes for Cassandra
Java
597
star
2

sparta

Real Time Analytics and Data Pipelines based on Spark Streaming
Scala
525
star
3

Decision

Powered by Spark Streaming & Siddhi
Java
315
star
4

Spark-MongoDB

Spark library for easy MongoDB access
Scala
306
star
5

stratio-cassandra

Discontinued in favour of Cassandra Lucene Index
Java
204
star
6

spark-rabbitmq

RabbitMQ Spark Streaming receiver
Scala
201
star
7

deep-spark

Connecting Apache Spark with different data stores [DEPRECATED]
Java
196
star
8

crossdata

DISCONTINUED - Easy access to big things. Library for Apache Spark extending and improving its capabilities
Scala
169
star
9

khermes

A distributed fake data generator based in Akka.
Scala
92
star
10

stratio-connector-mongodb

(DEPRECATED) A crossdata connector to MongoDB
Java
77
star
11

stratio-connector-decision

(DEPRECATED) A connector for stratio streaming
Java
73
star
12

stratio-connector-elasticsearch

(DEPRECATED) noverify
Java
72
star
13

stratio-connector-cassandra

(DEPRECATED) Native connector for Cassandra using Crossdata
Java
72
star
14

stratio-connector-commons

(DEPRECATED) The common module for the stratio connectors
Java
72
star
15

stratio-connector-deep

(DEPRECATED) Deep connector for multiple data sources
Java
70
star
16

stratio-connector-sparkSQL

(DEPRECATED) A crossdata connector to Spark SQL
Scala
67
star
17

stratio-connector-hdfs

(DEPRECATED) HDFS
Scala
66
star
18

crossdata-connector-skeleton

(DEPRECATED) Skeleton project that can be used to implement Crossdata connectors
Java
62
star
19

vagrant-ova-plugin

Vagrant plugin that export a box from vbox to vmwware
Ruby
61
star
20

datasource-receiver

Spark Receiver for SQL or NoSQL Databases like Cassandra, MongoDB, Elasticsearch or JDBC
Scala
42
star
21

egeo-starter

Egeo Starter is a Boilerplate project prepared for work with Egeo 1.x, Angular 2.x, TypeScript, Webpack, Karma, Jasmine and Sass.
TypeScript
40
star
22

kafka-elasticsearch-sink

Java
31
star
23

incubator-toree

Scala
30
star
24

valkiria

Go
29
star
25

rocket-examples

Sparta 2.x examples: workflows, plugins, sdk, docker ...
Scala
16
star