There are no reviews yet. Be the first to send feedback to the community and the maintainers!
SparkOnKudu
Based off the design of SparkOnHBase. This Repo will support Spark, Spark Streaming, and Spark SQL integration with Kudu.SparkStreaming.Sessionization
NRT Sessionization with Spark Streaming landing on HDFS and putting live stats in HBaseSparkUnitTestingExamples
This project is a collection of Spark Unit Tests Examples to help new Spark users have good examples on how to unit start their code for Spark Core, Spark SQL, and Spark StreamingSpark.TableStatsExample
Simple Spark example of generating table stats for use of data quality checksSparkOnHBase
SparkOnALog
Examples of Integrating Spark Streaming, Flume, and HBase to solve Streaming problemsCopybookInputFormat
Using JRecord to build a mapred and mapreduce inputformat for HDFS, MAPREDUCE, PIG, HIVE, Spark, ...HBase.MCC
HBase.MCC (HBase Multi Cluster Client). The goal is to support aways up solutions with HBase through multiple clustersHive.Generate.DDL
Generation tool that generates DDLs and simple data load scripts.hadcom.utils
Advanced common functionality for hadoopTaxi360
Simple Example of HBase, SolR, and Kudu for Entity 360 using NY taxi datateraSort-Compressed
The rules of tera sort say you can't compress the input and output. Well those rules are out of touch with how real use cases on hadoop.CleanUpEmptyFilesTool
This tool is designed to look through your HDFS folders to ether identify files with no data in them or delete files with no data in them.FileIngestor
A simple program to put files from a directory into HDFS with the added functionality and defining how that action will happenSpark..Unique.Seq.Generator
This is an example of how to make Unique Sequences in a distributed way with Spark (No dups, No Skips)Spark.GraphX.Examples
Just some example of using GraphXFlume.NettyAvroAsyncRpcClient
This is a layer on top of the Flume NettyAvroRpcClient that allows for multiple connects to a server.MRSmallFileCombiner
Tool to read many small files in HDFS with MR while control allowing the caller to define the number of mappers.Spark.ProdictBehaviorBasedOnPastActives
This is an example of how to do window analysis with SparkHBase.GetTopNRecords
This is a simple example to show how a single HBase "get" can retrieve the top N {items,amount} in the order of amount decresingHBaseMassiveBulkLoadUtils
This is a tool for testing and managing many repeatedly and large bulk loads on HBaseAppTrans
Examples for trainingEdgeNodeGraphUi
Connecting the power of the D3 graphing library to CDH (HDFS, HBase and Impala)HBase-FastTableCopy
This will contain implementations that will copy records from a table with less regions then the final table.spark.mergesort.example
An example of how to do a merge sortFairSchedulerPlus
A upgrade Extended FairScheduler that takes Sub-Groups into account.MapReduce.Unique.Seq.Generator
This is a single map reduce job that will append a unique sequence number to the front of every row in a source file.FixedLengthInputFormat
This is a FixedLengthInputFormat for Hadoop map reduce.SparkStreamingSeqSink
Support to write Seq Files with Spark Streaming with similar functionality as Flume HDFS Sink with Seq FilesIngestProcessStoreInNRT
This is a demo/training application. Used to show how easy it is to do operations like ingestion, aggregation, and change data capture. Using tools like Kafka, Spark Streaming, Flume, Kudu, SolR, HBase, and HDFSLove Open Source and this site? Check out how you can help us