• Stars
    star
    192
  • Rank 202,019 (Top 4 %)
  • Language
    Go
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Raccoon is a high-throughput, low-latency service to collect events in real-time from your web, mobile apps, and services using multiple network protocols.

Raccoon

build workflow package workflow License Version

Raccoon is high throughput, low-latency service that provides an API to ingest clickstream data from mobile apps, sites and publish it to Kafka. Raccoon uses the Websocket protocol for peer-to-peer communication and protobuf as the serialization format. It provides an event type agnostic API that accepts a batch (array) of events in protobuf format. Refer here for proto definition format that Raccoon accepts.

Key Features

  • Event Agnostic - Raccoon API is event agnostic. This allows you to push any event with any schema.
  • Event Distribution - Events are distributed to kafka topics based on the event meta-data
  • High performance - Long running persistent, peer-to-peer connection reduce connection set up overheads. Websocket provides reduced battery consumption for mobile apps (based on usage statistics)
  • Guaranteed Event Delivery - Server acknowledgements based on delivery. Currently it acknowledges failures/successes. Server can be augmented for zero-data loss or at-least-once guarantees.
  • Reduced payload sizes - Protobuf based
  • Metrics: - Built-in monitoring includes latency and active connections.

To know more, follow the detailed documentation

Use cases

Raccoon can be used as an event collector, event distributor and as a forwarder of events generated from mobile/web/IoT front ends as it provides an high volume, high throughput, low latency event-agnostic APIs. Raccoon can serve the needs of data ingestion in near-real-time. Some domains where Raccoon could be used is listed below

  • Adtech streams: Where digital marketing data from external sources can be ingested into the organization backends
  • Clickstream: Where user behavior data can be streamed in real-time
  • Edge systems: Where devices (say in the IoT world) need to send data to the cloud.
  • Event Sourcing: Such as Stock updates dashboards, autonomous/self-drive use cases

Resources

Explore the following resources to get started with Raccoon:

  • Guides provides guidance on deployment and client sample.
  • Concepts describes all important Raccoon concepts.
  • Reference contains details about configurations, metrics and other aspects of Raccoon.
  • Contribute contains resources for anyone who wants to contribute to Raccoon.

Run with Docker

Prerequisite

  • Docker installed

Run Docker Image

Raccoon provides Docker image as part of the release. Make sure you have Kafka running on your local and run the following.

# Download docker image from docker hub
$ docker pull raystack/raccoon

# Run the following docker command with minimal config.
$ docker run -p 8080:8080 \
  -e SERVER_WEBSOCKET_PORT=8080 \
  -e SERVER_WEBSOCKET_CONN_ID_HEADER=X-User-ID \
  -e PUBLISHER_KAFKA_CLIENT_BOOTSTRAP_SERVERS=host.docker.internal:9093 \
  -e EVENT_DISTRIBUTION_PUBLISHER_PATTERN=clickstream-%s-log \
  raystack/raccoon

Run Docker Compose You can also use docker-compose on this repo. The docker-compose provides raccoon along with Kafka setup. Then, run the following command.

# Run raccoon along with kafka setup
$ make docker-run
# Stop the docker compose
$ make docker-stop

You can consume the published events from the host machine by using localhost:9094 as kafka broker server. Mind the topic routing when you consume the events.

Running locally

Prerequisite:

  • You need to have GO 1.18 or above installed
  • You need protoc installed
# Clone the repo
$ git clone https://github.com/raystack/raccoon.git

# Build the executable
$ make

# Configure env variables
$ vim .env

# Run Raccoon
$ ./out/raccoon

Note: Read the detail of each configurations here.

Running tests

# Running unit tests
$ make test

# Running integration tests
$ cp .env.test .env
$ make docker-run
$ INTEGTEST_BOOTSTRAP_SERVER=localhost:9094 INTEGTEST_HOST=localhost:8080 INTEGTEST_TOPIC_FORMAT="clickstream-%s-log" GRPC_SERVER_ADDR="localhost:8081" go test ./integration -v

Contribute

Development of Raccoon happens in the open on GitHub, and we are grateful to the community for contributing bugfixes and improvements. Read below to learn how you can take part in improving Raccoon.

Read our contributing guide to learn about our development process, how to propose bugfixes and improvements, and how to build and test your changes to Raccoon.

To help you get your feet wet and get you familiar with our contribution process, we have a list of good first issues that contain bugs which have a relatively limited scope. This is a great place to get started.

This project exists thanks to all the contributors.

License

Raccoon is Apache 2.0 licensed.

More Repositories

1

optimus

Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Go
742
star
2

firehose

Firehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.
Java
319
star
3

dagger

Dagger is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of real-time streaming data.
Java
261
star
4

frontier

Frontier is an all-in-one user management platform that provides identity, access and billing management to help organizations secure their systems and data. (Open source alternative to Clerk)
Go
252
star
5

stencil

Stencil is a schema registry that provides schema management and validation dynamically, efficiently, and reliably to ensure data compatibility across applications.
Go
221
star
6

meteor

Meteor is an easy-to-use, plugin-driven metadata collection framework to extract data from different sources and sink to any data catalog.
Go
181
star
7

guardian

Guardian is universal data access management tool with automated access workflows and security controls across data stores, analytical systems, and cloud products.
Go
137
star
8

siren

Siren provides an easy-to-use universal alert, notification, channels management framework for the entire observability infrastructure.
Go
77
star
9

compass

Compass is an enterprise data catalog that makes it easy to find, understand, and govern data.
Go
63
star
10

apsara

Apsara is an open-source re-usable UI components built using Radix UI and CSS modules to power Raystack projects.
TypeScript
56
star
11

proton

This repository is home to the original protobuf interface definitions which are used throughout the Raystack ecosystem.
54
star
12

cosmos

Cosmos is an operational analytics server to build custom apps with embedded analytics that deliver data experiences as unique as your business.
TypeScript
46
star
13

charts

This repository is home to the original helm charts for products throughout the open data platform ecosystem.
Smarty
41
star
14

transformers

This repository is home to the Optimus data transformation plugins for various data processing needs.
Python
35
star
15

homebrew-tap

This repository is home to the original homebrew taps for products throughout the Raystack ecosystem.
Ruby
31
star
16

platform

ODPF is the next-gen collaborative and distributed data platform to power data-driven workflows.
30
star
17

entropy

Entropy is a framework to safely and predictably create, change, and improve modern cloud applications and infrastructure using familiar languages, tools, and engineering practices.
Go
19
star
18

handbook

Handbook is the central repository for how we build products within ODPF community.
CSS
14
star
19

salt

Salt is a collection of libraries and tools used in the Raystack ecosystem to improve the experience of developing projects with Go.
Go
13
star
20

depot

Depot contains various common sink implementations and publishes them as a library. This library will be used in firehose, daggers or any other application which wants to send data to destinations.
Java
9
star
21

predator

Go
3
star
22

dex

Data Experience
Go
3
star
23

frontier-go

Go
2
star
24

frontier-python

Python
2
star
25

.github

This repository contains the community health files for the @raystack organization
1
star
26

chronicle

TypeScript
1
star
27

scoop-bucket

This repository is home to the original scoop buckets for products throughout the Open DataOps platform ecosystem.
1
star