• Stars
    star
    1,091
  • Rank 41,112 (Top 0.9 %)
  • Language
    Java
  • License
    Other
  • Created over 8 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Distributed Stream and Batch Processing

Join the
community on Slack Code Quality: Java Docker pulls Downloads Contributors

Note on Hazelcast 5

With the release of Hazelcast 5.0, development of Jet has been moved to the core Hazelcast Repository - please follow the repository for details on how to use Hazelcast for building data pipelines.

Hazelcast 5 also comes with extensive documentation, replacing the existing Jet docs: https://docs.hazelcast.com/hazelcast/latest/index.html

What is Jet

Jet is an open-source, in-memory, distributed batch and stream processing engine. You can use it to process large volumes of real-time events or huge batches of static datasets. To give a sense of scale, a single node of Jet has been proven to aggregate 10 million events per second with latency under 10 milliseconds.

It provides a Java API to build stream and batch processing applications through the use of a dataflow programming model. After you deploy your application to a Jet cluster, Jet will automatically use all the computational resources on the cluster to run your application.

If you add more nodes to the cluster while your application is running, Jet automatically scales up your application to run on the new nodes. If you remove nodes from the cluster, it scales it down seamlessly without losing the current computational state, providing exactly-once processing guarantees.

For example, you can represent the classical word count problem that reads some local files and outputs the frequency of each word to console using the following API:

JetInstance jet = Jet.bootstrappedInstance();

Pipeline p = Pipeline.create();
p.readFrom(Sources.files("/path/to/text-files"))
 .flatMap(line -> traverseArray(line.toLowerCase().split("\\W+")))
 .filter(word -> !word.isEmpty())
 .groupingKey(word -> word)
 .aggregate(counting())
 .writeTo(Sinks.logger());

jet.newJob(p).join();

and then deploy the application to the cluster:

bin/jet submit word-count.jar

Another application which aggregates millions of sensor readings per second with 10-millisecond resolution from Kafka looks like the following:

Pipeline p = Pipeline.create();

p.readFrom(KafkaSources.<String, Reading>kafka(kafkaProperties, "sensors"))
 .withTimestamps(event -> event.getValue().timestamp(), 10) // use event timestamp, allowed lag in ms
 .groupingKey(reading -> reading.sensorId())
 .window(sliding(1_000, 10)) // sliding window of 1s by 10ms
 .aggregate(averagingDouble(reading -> reading.temperature()))
 .writeTo(Sinks.logger());

jet.newJob(p).join();

Jet comes with out-of-the-box support for many kinds of data sources and sinks, including:

  • Apache Kafka
  • Local Files (Text, Avro, JSON)
  • Apache Hadoop (Azure Data Lake, S3, GCS)
  • Apache Pulsar
  • Debezium
  • Elasticsearch
  • JDBC
  • JMS
  • InfluxDB
  • Hazelcast
  • Redis
  • MongoDB
  • Twitter

When Should You Use Jet

Jet is a good fit when you need to process large amounts of data in a distributed fashion. You can use it to build a variety of data-processing applications, such as:

  • Low-latency stateful stream processing. For example, detecting trends in 100 Hz sensor data from 100,000 devices and sending corrective feedback within 10 milliseconds.
  • High-throughput, large-state stream processing. For example, tracking GPS locations of millions of users, inferring their velocity vectors.
  • Batch processing of big data volumes, for example analyzing a day's worth of stock trading data to update the risk exposure of a given portfolio.

Key Features

Predictable Latency Under Load

Jet uses a unique execution model with cooperative multithreading and can achieve extremely low latencies while processing millions of items per second on just a single node:

The engine is able to run anywhere from tens to thousands of jobs concurrently on a fixed number of threads.

Fault Tolerance With No Infrastructure

Jet stores computational state in a distributed, replicated in-memory store and does not require the presence of a distributed file system nor infrastructure like Zookeeper to provide high-availability and fault-tolerance.

Jet implements a version of the Chandy-Lamport algorithm to provide exactly-once processing under the face of failures. When interfacing with external transactional systems like databases, it can provide end-to-end processing guarantees using two-phase commit.

Advanced Event Processing

Event data can often arrive out of order and Jet has first-class support for dealing with this disorder. Jet implements a technique called distributed watermarks to treat disordered events as if they were arriving in order.

How Do I Get Started

Follow the Get Started guide to start using Jet.

Download

You can download Jet from https://jet-start.sh.

Alternatively, you can use the latest docker image:

docker run -p 5701:5701 hazelcast/hazelcast-jet

Use the following Maven coordinates to add Jet to your application:

<groupId>com.hazelcast.jet</groupId>
<artifactId>hazelcast-jet</artifactId>
<version>4.2</version>

Tutorials

See the tutorials for tutorials on using Jet. Some examples:

Reference

Jet supports a variety of transforms and operators. These include:

Community

Hazelcast Jet team actively answers questions on Stack Overflow and Hazelcast Community Slack.

You are also encouraged to join the hazelcast-jet mailing list if you are interested in community discussions

How Can I Contribute

Thanks for your interest in contributing! The easiest way is to just send a pull request. Have a look at the issues marked as good first issue for some guidance.

Building From Source

To build, use:

./mvnw clean package -DskipTests

Use Latest Snapshot Release

You can always use the latest snapshot release if you want to try the features currently under development.

Maven snippet:

<repositories>
    <repository>
        <id>snapshot-repository</id>
        <name>Maven2 Snapshot Repository</name>
        <url>https://oss.sonatype.org/content/repositories/snapshots</url>
        <snapshots>
            <enabled>true</enabled>
            <updatePolicy>daily</updatePolicy>
        </snapshots>
    </repository>
</repositories>
<dependencies>
    <dependency>
        <groupId>com.hazelcast.jet</groupId>
        <artifactId>hazelcast-jet</artifactId>
        <version>4.3-SNAPSHOT</version>
    </dependency>
</dependencies>

Trigger Phrases in the Pull Request Conversation

When you create a pull request (PR), it must pass a build-and-test procedure. Maintainers will be notified about your PR, and they can trigger the build using special comments. These are the phrases you may see used in the comments on your PR:

  • verify - run the default PR builder, equivalent to mvn clean install
  • run-nightly-tests - use the settings for the nightly build (mvn clean install -Pnightly). This includes slower tests in the run, which we don't normally run on every PR
  • run-windows - run the tests on a Windows machine (HighFive is not supported here)
  • run-cdc-debezium-tests - run all tests in the extensions/cdc-debezium module
  • run-cdc-mysql-tests - run all tests in the extensions/cdc-mysql module
  • run-cdc-postgres-tests - run all tests in the extensions/cdc-postgres module

Where not indicated, the builds run on a Linux machine with Oracle JDK 8.

License

Source code in this repository is covered by one of two licenses:

  1. Apache License 2.0
  2. Hazelcast Community License

The default license throughout the repository is Apache License 2.0 unless the header specifies another license. Please see the Licensing section for more information.

Credits

We owe (the good parts of) our CLI tool's user experience to picocli.

Copyright

Copyright (c) 2008-2021, Hazelcast, Inc. All Rights Reserved.

Visit www.hazelcast.com for more info.

More Repositories

1

hazelcast

Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.
Java
5,802
star
2

hazelcast-code-samples

Hazelcast Code Samples
Java
536
star
3

hazelcast-go-client

Hazelcast Go Client
Go
182
star
4

hazelcast-kubernetes

Kubernetes Discovery for Hazelcast
Java
172
star
5

hazelcast-nodejs-client

Hazelcast Node.js Client
TypeScript
147
star
6

hazelcast-jet-demos

Demonstration applications using Hazelcast Jet
Java
138
star
7

hazelcast-python-client

Hazelcast Python Client
Python
112
star
8

hazelcast-csharp-client

Hazelcast .NET Client
C#
101
star
9

hazelcast-docker

This repository contains docker image for Hazelcast open-source in-memory data-grid.
Dockerfile
90
star
10

hazelcast-jet-code-samples

83
star
11

hazelcast-simulator

A tool for stress testing Hazelcast
Java
83
star
12

spring-data-hazelcast

Hazelcast Spring Data integration Project http://projects.spring.io/spring-data/
Java
82
star
13

hazelcast-cpp-client

Hazelcast IMDG C++ Client
C++
78
star
14

quarkus-hazelcast-client

Quarkus Hazelcast Client Extension
Java
43
star
15

hazelcast-hibernate

A distributed second-level cache for Hibernate
Java
41
star
16

hazelcast-aws

AWS EC2 discovery plugin for hazelcast
Java
38
star
17

charts

Hazelcast Official Helm Chart Repository
Smarty
35
star
18

hazelcast-platform-demos

hazelcast-platform-demos
Java
34
star
19

hazelcast-tomcat-sessionmanager

Tomcat Based Web Session Replication
Java
33
star
20

hazelcast-eureka

Hazelcast Discovery SPI Plugin for Netflix' Eureka Service Discovery V1
Java
30
star
21

hazelcast-scala

Scala language support for Hazelcast
Scala
26
star
22

hazelcast-commandline-client

Home of Hazelcast CLC
Shell
23
star
23

hazelcast-spark

Spark Connector for Hazelcast
Scala
23
star
24

hazelcast-zookeeper

Hazelcast Discovery Plugin for Apache ZooKeeper
Java
23
star
25

hazelcast-platform-operator

Easily deploy Hazelcast clusters and Management Center into Kubernetes/OpenShift environments and manage their lifecycles.
Go
23
star
26

training

Java
22
star
27

hazelcast-jet-contrib

Extension modules for Hazelcast Jet
Java
21
star
28

hazelcast-platform-training

Java
19
star
29

hazelcast-reference-manual

Hazelcast Reference Manual
Java
19
star
30

hazelcast-hibernate-3-and-4

distributed second level cache for your Hibernate
Java
18
star
31

hive

Design system built with A11Y in mind
TypeScript
18
star
32

hazelcast-azure

Azure discovery plugin for hazelcast
Java
16
star
33

trading-demo

A trading demo application
Java
16
star
34

jet-train

Kotlin
14
star
35

hazelcast-wm

Hazelcast filter-based Web Session Manager
Java
13
star
36

management-center-docker

This repository contains Docker image for Hazelcast Management Center.
Dockerfile
13
star
37

hazelcast-client-protocol

Hazelcast Open Binary Client Protocol
Jinja
12
star
38

big-data-benchmark

Java
12
star
39

hazelcast-mesos

Hazelcast on Mesos
Java
12
star
40

docker-grafana-graphite

Docker image with StatsD, Graphite and Grafana 2
Dockerfile
11
star
41

hz-docs

Source content for the Hazelcast Platform documentation
JavaScript
10
star
42

hazelcast-operator

Shell
10
star
43

hazelcast-packaging

Shell
9
star
44

hazelcast-openshift

Hazelcast Openshift
Dockerfile
9
star
45

hazelcast-jetty-sessionmanager

Jetty Based Web Session Replication
Java
7
star
46

hazelcast-remote-controller

Java
7
star
47

hazelcast-hibernate4

!!!This repo is outdated. Kept alive only for backward compatibility reasons.!!! Beta implementation of hazelcast-hibernate module for Hibernate 4.0.
Java
7
star
48

homebrew-hz

Homebrew Repository for Hazelcast Command Line
Ruby
7
star
49

betleopard

Java
6
star
50

hazelcast-jmh

JMH Benchmarks for Hazelcast
Java
6
star
51

hazelcast-docs-ui

User interface styles for the Hazelcast documentation playbook.
Handlebars
6
star
52

hazelcast-docs

Hazelcast documentation playbook
JavaScript
6
star
53

imdg-docs

Source content for the Hazelcast IMDG documentation
JavaScript
5
star
54

hazelcast-jet-beam-runner

Hazelcast Jet Runner for Apache Beam
Java
5
star
55

hazelcast-gcp

Google Cloud Platform
Java
4
star
56

fraud-detection-onnx

Hazelcast and Onnx for Low Latency Fraud Detection
Java
4
star
57

hazelcast-jdbc

Hazelcast JDBC Driver allows Java applications to connect to Hazelcast using the standard JDBC API
Java
4
star
58

hazelcast-qa

Collection of Hazelcast QA tools.
Java
4
star
59

hazelcast-jet-reference-manual

Reference Manual for Hazelcast Jet
4
star
60

training-courses

Java
4
star
61

hazelcast-docker-samples

Java
4
star
62

hazelcast-demos

Java
4
star
63

hazelcast-platform-operator-agent

Go
3
star
64

hazelcast-ra

Hazelcast JCA Resource Adapter
Java
3
star
65

hazelcast-gradle-starter

Java
3
star
66

rel-scripts

3
star
67

hazelcast-dissector-for-wireshark

Hazelcast 4+ member protocol dissector
C
3
star
68

client-compatibility-suites

PowerShell
3
star
69

performancetop5

The benchmarks for the Hazelcast Performance Top 5 blog series
Java
3
star
70

management-center-docs

Source content for the Hazelcast Management Center documentation
JavaScript
3
star
71

hazelcast-python-client-kerberos

Kerberos authentication support for Hazelcast Python Client
Python
3
star
72

hazelcast-jet-ansible-tests

Set of tests for Hazelcast Jet Soak Testing Environment.
Java
2
star
73

hazelcast-cloud-cli

CLI for Hazelcast Cloud
Go
2
star
74

hazelcast-cloud-go-sample-client

Sample Go Client For Hazelcast Cloud Community
Go
2
star
75

hazelcast-dynacache

This repository contains Hazelcast DynaCache feature for Liberty Profile.
Java
2
star
76

hazelcast-cloud-code-samples

Hazelcast Cloud Code Samples
Java
2
star
77

hazelcast-platform-operator-docs

Source content for the Kubernetes Operator of Hazelcast Platform (Enterprise)
JavaScript
2
star
78

hazelcast-jet-docker

This repository contains Docker image for Hazelcast Jet open-source distributed computing platform built for high-performance stream processing and fast batch processing.
Dockerfile
2
star
79

hazelcast-hadoop

Hadoop Integration for Hazelcast IMDG
Java
2
star
80

cloud-docs

Source content for the Hazelcast Cloud documentation
JavaScript
1
star
81

sample-worker

Sample worker
Java
1
star
82

fraud-detection-python

Python
1
star
83

hazelcast-cloud-maven-plugin

Maven Plugin for Hazelcast Cloud
Java
1
star
84

clc-kafka-jet-demo

Kotlin
1
star
85

management-center-openshift

Dockerfile
1
star
86

hazelcast-grails

hazelcast grails plugin
Groovy
1
star
87

hazelcast-jet-management-center-docker

This repository contains Docker image for Hazelcast Jet Management Center
Dockerfile
1
star
88

hazelcast-cloud-community

Hazelcast Cloud Community Feature,Feedback,Issue Repository
1
star
89

hazelcast-cloud-python-sample-client

Sample Python Client For Hazelcast Cloud Community
Python
1
star
90

hazelcast-jclouds

jclouds discovery plugin for hazelcast
Java
1
star
91

hazelcast-tpm

Repository of Hazelcast Technical Program Management Team for storing scripts & tools that helps gathering & processing information.
Python
1
star
92

cfsummit17-wednesday

Hazelcast Technology for PCF Demo presented at Cloud Foundry Summit Silicon Valley 2017
Java
1
star
93

java-client-reference-manual

Java
1
star
94

hazelcast-diagnostics

Java
1
star
95

clc-docs

JavaScript
1
star