• Stars
    star
    6,607
  • Rank 5,355 (Top 0.2 %)
  • Language
    Scala
  • License
    Apache License 2.0
  • Created almost 12 years ago
  • Updated 27 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP

Snowplow logo

Release Release activity Latest release Docker pulls Discourse posts License

Overview

Snowplow is a developer-first engine for collecting behavioral data. In short, it allows you to:

Thousands of organizations around the world generate, enhance, and model behavioral data with Snowplow to fuel advanced analytics, AI/ML initiatives, or composable CDPs.

Table of contents

Why Snowplow?

  • ๐Ÿ”๏ธ Rock solid architecture capable of processing billions of events per day.
  • ๐Ÿ› ๏ธ Over 20 SDKs to collect data from web, mobile, server-side, and other sources.
  • โœ… A unique approach based on schemas and validation ensures your data is as clean as possible.
  • ๐Ÿช„ Over 15 enrichments to get the most out of your data.
  • ๐Ÿญ Send data to popular warehouses and streams โ€” Snowplow fits nicely within the Modern Data Stack.

โžก Where to start? โฌ…๏ธ

Snowplow Open Source Snowplow Behavioral Data Platform
Our Open Source solution equips you with everything you need to start creating behavioral data in a high-fidelity, machine-readable way. Head over to the Quick Start Guide to set things up. Looking for an enterprise solution with a console, APIs, data governance, workflow tooling? The Behavioral Data Platform is our managed service that runs in your AWS or GCP cloud. Check out Try Snowplow.

The documentation is a great place to learn more, especially:

  • Tracking design โ€” discover how to approach creating your data the Snowplow way.
  • Pipelines โ€” understand whatโ€™s under the hood of Snowplow.

Would rather dive into the code? Then you are already in the right place!


Snowplow technology 101

Snowplow architecture

The repository structure follows the conceptual architecture of Snowplow, which consists of six loosely-coupled sub-systems connected by five standardized data protocols/formats.

To briefly explain these six sub-systems:

  • Trackers fire Snowplow events. Currently we have 15 trackers, covering web, mobile, desktop, server and IoT
  • Collector receives Snowplow events from trackers. Currently we have one official collector implementation with different sinks: Amazon Kinesis, Google PubSub, Amazon SQS, Apache Kafka and NSQ
  • Enrich cleans up the raw Snowplow events, enriches them and puts them into storage. Currently we have several implementations, built for different environments (GCP, AWS, Apache Kafka) and one core library
  • Storage is where the Snowplow events live. Currently we store the Snowplow events in a flat file structure on S3, and in the Redshift, Postgres, Snowflake and BigQuery databases
  • Data modeling is where event-level data is joined with other data sets and aggregated into smaller data sets, and business logic is applied. This produces a clean set of tables which make it easier to perform analysis on the data. We officially support data models for Redshift, Snowflake and BigQuery.
  • Analytics are performed on the Snowplow events or on the aggregate tables.

For more information on the current Snowplow architecture, please see the Technical architecture.

Version Compatibility Matrix

To make sure all the components work well together, we strongly recommended you take a look at the compatibility matrix when setting up a Snowplow pipeline.


About this repository

This repository is an umbrella repository for all loosely-coupled Snowplow components and is updated on each component release.

Since June 2020, all components have been extracted into their dedicated repositories (more info here) and this repository serves as an entry point for Snowplow users, the home of our public roadmap and as a historical artifact.

Components that have been extracted to their own repository are still here as git submodules.

Trackers

A full list of supported trackers can be found on our documentation site. Popular trackers and use cases include:

Web Mobile Gaming TV Desktop & Server
JavaScript Android Unity Roku Command line
AMP iOS C++ iOS .NET
React Native Lua Android Go
Flutter React Native Java
Node.js
PHP
Python
Ruby
Scala
C++
Rust
Lua

Collector

Enrich

Loaders

Iglu

Data modeling

Web

Mobile

Media

Retail

Testing

Parsing enriched event

Bad rows

Terraform Modules


Public Roadmap

This repository also contains the Snowplow Public Roadmap. The Public Roadmap lets you stay up to date and find out what's happening on the Snowplow Platform. Help us prioritize our cards: open the issue and leave a ๐Ÿ‘ to vote for your favorites. Want us to build a feature or function? Tell us by heading to our Discourse forum ๐Ÿ’ฌ.

Community

We want to make it super easy for Snowplow users and contributors to talk to us and connect with one another, to share ideas, solve problems and help make Snowplow awesome. Join the conversation:

  • Meetups. Donโ€™t miss your chance to talk to us in person. We are often on the move with meetups in Amsterdam, Berlin, Boston, London, and more.
  • Discourse. Our forum for all Snowplow users: engineers setting up Snowplow, data modelers structuring the data, and data consumers building insights. You can find guides, recipes, questions and answers from Snowplow users and the Snowplow team. All questions and contributions are welcome!
  • Twitter. Follow @Snowplow for official news and @SnowplowLabs for engineering-heavy conversations and release announcements.
  • GitHub. If you spot a bug, please raise an issue in the GitHub repository of the component in question. Likewise, if you have developed a cool new feature or an improvement, please open a pull request, weโ€™ll be glad to integrate it in the codebase! For brainstorming a potential new feature, Discourse is the best place to start.
  • Email. If you want to talk to Snowplow directly, email is the easiest way. Get in touch at [email protected].

Copyright and license

Snowplow is copyright 2012-2023 Snowplow Analytics Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

More Repositories

1

snowplow-javascript-tracker

Snowplow event tracker for client-side and server-side JavaScript. Add analytics to your websites, web apps and servers.
TypeScript
523
star
2

factotum

A system to programmatically run data pipelines
Rust
208
star
3

iglu

Iglu is a machine-readable, open-source schema repository for JSON Schema from the team at Snowplow
Shell
203
star
4

ansible-playbooks

Ansible playbooks to install common platforms and tools (e.g. JVM, Ruby, Postgres etc.)
Shell
175
star
5

schema-guru

JSONs -> JSON Schema
Scala
149
star
6

snowplow-mini

An easily-deployable, single-instance version of Snowplow
Go
120
star
7

spark-example-project

A Spark WordCountJob example as a standalone SBT project with Specs2 tests, runnable on Amazon EMR
Scala
118
star
8

iglu-central

Contains all JSON Schemas, Avros and Thrifts for Iglu Central
Shell
111
star
9

aws-lambda-nodejs-example-project

An AWS Lambda function in Node.js reading events from Amazon Kinesis and writing event counts to DynamoDB
JavaScript
102
star
10

snowplow-android-tracker

Snowplow event tracker for Android. Add analytics to your Android apps and games
Kotlin
101
star
11

spark-streaming-example-project

A Spark Streaming job reading events from Amazon Kinesis and writing event counts to DynamoDB
Scala
95
star
12

scala-maxmind-iplookups

Scala client for MaxMind Geo-IP
Scala
86
star
13

scalding-example-project

The Scalding WordCountJob example as a standalone SBT project with Specs2 tests, runnable on Amazon EMR
Scala
82
star
14

sql-runner

Run templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake
Go
79
star
15

snowplow-ios-tracker

Snowplow event tracker for Swift and Objective-C. Add analytics to your iOS, macOS, tvOS and watchOS apps and games
Swift
75
star
16

snowplow-docker

Docker images for Snowplow, Iglu and associated projects
Dockerfile
61
star
17

snowplow-web-data-model

SQL data model for working with Snowplow web data. Supports Redshift and Looker. Snowflake and BigQuery coming soon
LookML
60
star
18

aws-lambda-scala-example-project

An AWS Lambda function in Scala reading events from Amazon Kinesis and writing event counts to DynamoDB
Scala
57
star
19

dbt-snowplow-web

A fully incremental model, that transforms raw web event data generated by the Snowplow JavaScript tracker into a series of derived tables of varying levels of aggregation.
Shell
46
star
20

scala-forex

High-performance Scala library for performing exchange rate lookups and currency conversions
Scala
45
star
21

scala-weather

High-performance Scala library for looking up the weather
Scala
45
star
22

snowplow-s3-loader

Mirrors a Kinesis stream to Amazon S3 using the KCL
Scala
41
star
23

data-models

โš ๏ธ MAINTENANCE-ONLY MODE: Snowplow maintained SQL data models for working with Snowplow web and mobile behavioral data.
PLpgSQL
40
star
24

snowplow-python-tracker

Snowplow event tracker for Python. Add analytics to your Python and Django apps, webapps and games
Python
39
star
25

snowplow-php-tracker

Snowplow event tracker for PHP. Add analytics into your PHP apps and scripts
PHP
32
star
26

snowplow-rdb-loader

Stores Snowplow enriched events in Redshift, Snowflake and Databricks
Scala
30
star
27

snowplow-react-native-tracker

Snowplow event tracker for react-native apps
TypeScript
29
star
28

google-cloud-dataflow-example-project

Example stream processing job, written in Scala with Apache Beam, for Google Cloud Dataflow
Scala
29
star
29

snowplow-golang-tracker

Snowplow event tracker for Golang. Add analytics to your Go apps and servers
Go
25
star
30

snowplow-nodejs-tracker

Snowplow event tracker for Node.js. Add analytics to your JavaScript apps, node-webkit projects and Node.js servers
TypeScript
24
star
31

snowplow-java-tracker

Snowplow event tracker for Java. Add analytics to your Java desktop and server apps, servlets and games. (See also: snowplow-android-tracker)
Java
23
star
32

kinesis-example-scala-consumer

Example Scala/SBT event consumer for Amazon Kinesis
Scala
22
star
33

kinesis-example-scala-producer

Example Scala/SBT event producer for Amazon Kinesis
Scala
21
star
34

snowplow-python-analytics-sdk

Python SDK for working with Snowplow enriched events in Spark, AWS Lambda et al.
Python
21
star
35

snowplow-ruby-tracker

Snowplow event tracker for Ruby. Add analytics to your Ruby and Rails apps and gems
Ruby
21
star
36

stream-collector

Collector for cloud-native web, mobile and event analytics, running on AWS and GCP
Scala
20
star
37

snowplow-scala-analytics-sdk

Scala SDK for working with Snowplow enriched events in Spark, AWS Lambda, Flink et al.
Scala
20
star
38

snowplow-dotnet-tracker

Snowplow event tracker for .NET. Add analytics to your ASP.NET, C#, F# and Visual Basic apps, servers and games
C#
19
star
39

dataflow-runner

Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR
Go
19
star
40

enrich

Snowplow Enrichment jobs and library
Scala
18
star
41

cloudfront-log-deserializer

A Hive Deserializer for CloudFront access logs (supports download distribution files only)
Java
17
star
42

quickstart-examples

Examples of how to automate creating a Snowplow Open Source pipeline
HCL
15
star
43

iglu-example-schema-registry

Example static schema registry for Iglu
15
star
44

snowplow-unity-tracker

Snowplow event tracker for Unity. Add analytics to your Unity games and apps
C#
14
star
45

avalanche

Load testing for event analytics platforms (Snowplow, more coming soon)
Scala
13
star
46

kinesis-tee

Unix tee, but for Kinesis streams
Scala
12
star
47

snowplowanalytics.com

The Snowplow website
HTML
12
star
48

snowbridge

For replicating streams across clouds, accounts and regions
Go
12
star
49

iglu-server

A RESTful schema registry
Scala
11
star
50

snowplow-elasticsearch-loader

Writes Snowplow enriched events from Kinesis to Elasticsearch
Scala
11
star
51

dev-environment

Vagrant-based Snowplow development environment with Ansible playbooks to install common tools
Shell
11
star
52

documentation

Snowplow Documentation Website
JavaScript
10
star
53

factotum-server

Rust
10
star
54

dbt-snowplow-mobile

A fully incremental model, that transforms raw mobile event data generated by the Snowplow mobile trackers into a series of derived tables of varying levels of aggregation.
Shell
10
star
55

dbt-snowplow-fractribution

Snowplow Fractribution (marketing attribution) model for dbt
Python
9
star
56

r-data-science-environment

VM with complete R (RStudio) environment
Shell
9
star
57

snowplow-tracking-cli

Command-line app for tracking Snowplow events. Add analytics to your shell scripts and terminal sessions
Go
8
star
58

snowplow-cpp-tracker

Snowplow event tracker for C++. Add analytics to your C++ applications, games and servers
C++
8
star
59

igluctl

A command-line tool for working with Iglu schema registries
Scala
8
star
60

dbt-snowplow-utils

Snowplow utility functions to be used in conjunction with the snowplow-web dbt package.
PLpgSQL
8
star
61

snowplow-scala-tracker

Snowplow event tracker for Scala. Add analytics to your Scala, Akka and Play apps and servers
Scala
8
star
62

release-manager

Uploads zipfiles to Bintray and creates versions
Python
7
star
63

snowplow-badrows

Scala
7
star
64

snowplow-gtm-server-side-client

A Google Tag Manager Server-side Client template for collecting events using the Snowplow JavaScript Tracker
Smarty
7
star
65

snowplow-rust-tracker

Rust
7
star
66

snowplow-arduino-tracker

Snowplow event tracker for Arduino. Add analytics to sketches on IP-connected Arduino boards
C++
7
star
67

snowplow-looker-demo

LookML for the Snowplow Looker demo
LookML
5
star
68

snowplow-omniture-ingest

Ingests Omniture data (exported as log files) into SnowPlow for more involved analysis
5
star
69

schema-ddl

ASTs and generators for producing various DDL and Schema formats
Scala
5
star
70

iglu-scala-client

Scala client for Iglu schema registry
Scala
5
star
71

iab-spiders-and-robots-java-client

Java 8+ client library for the IAB and ABC International Spiders and Robots list
Java
5
star
72

samza-scala-example-project

An Apache Samza stream processing job written in Scala
Scala
5
star
73

snowplow-dotnet-analytics-sdk

C#
4
star
74

beam-enrich

Dataflow job reading tracked events from PubSub, validating and enriching them and writing them back to PubSub
Scala
4
star
75

dbt-snowplow-media-player

A fully incremental model, that transforms media player event data generated by the Snowplow JavaScript tracker into derived tables for easier querying
Shell
4
star
76

dbt-snowplow-ecommerce

A fully incremental model, that transforms raw ecommerce event data generated by the Snowplow JavaScript tracker into a series of derived tables representing various ecommerce data objects.
Shell
4
star
77

looker-snowplow-web

A LookML block, that uses data from the Snowplow JavaScript tracker and Web Data Model derived tables and makes it available for exploration in Looker.
LookML
4
star
78

snowplow-gtm-server-side-tag

A Google Tag Manager Server-side Tag template for sending events to a Snowplow Collector
Smarty
4
star
79

snowplow-golang-analytics-sdk

Golang Analytics SDK for working with Snowplow enriched events in cloud functions and other Go applications.
Go
4
star
80

snowplow-aws-lambda-source

Sends Amazon S3 object operations into Snowplow, implemented as an AWS Lambda
4
star
81

snowplow-gtm-custom-template

GTM Custom Template for the Snowplow JavaScript Tracker (v2)
Smarty
4
star
82

snowplow-actionscript3-tracker

Snowplow event tracker for ActionScript 3.0. Add analytics to your Flash Player 9+, Flash Lite 4 and AIR games, apps and widgets
ActionScript
4
star
83

iglu-ruby-client

Ruby and JRuby client for Iglu
Ruby
3
star
84

spark-data-science-environment

VM with Spark ready-to-go
Shell
3
star
85

snowplow-lua-tracker

Snowplow event tracker for Lua. Add analytics to your Lua apps and Lua-scripted games
Lua
3
star
86

neo4j-data-science-environment

VM with Neo4j installed
Shell
3
star
87

snowplow-gtm-server-side-amplitude-tag

A Google Tag Manager Server-side Amplitude Tag template for send events to the Amplitude HTTP API v2
Smarty
3
star
88

sp-js-assets

Contains all of the Snowplow JavaScript Tracker assets.
JavaScript
3
star
89

scala-serf-client

Minimal wrapper around https://github.com/tv2norge/java-serf-client
Scala
3
star
90

mobile-hybrid-apps-accelerator

Tutorial and demo apps showing how to instrument hybrid mobile apps with Snowplow tracking
Shell
3
star
91

hive-example-udf

Java
3
star
92

advanced-analytics-web-accelerator

Tutorial and visualisations showing how to instrument web analytics with Snowplow
Shell
3
star
93

python-data-science-environment

Shell
3
star
94

composable-cdp-with-predictive-ml-modeling-accelerator

A composable CDP accelerator using Snowplow, Databricks & Hightouch
HTML
3
star
95

snowplow-scala-project.g8

Shell
3
star
96

marketing-attribution-accelerator

A Snowplow accelerator which describes how to do marketing attribution with Snowplow
Shell
2
star
97

makefile-rs

WIP Rust crate for parsing extremely simple Makefiles
Rust
2
star
98

advanced-analytics-mobile-accelerator

Tutorial and visualisations showing how to instrument mobile analytics with Snowplow
Shell
2
star
99

scala-util

Reusable Scala code from Snowplow Analytics
Scala
2
star
100

event-manifest-cleaner

A Spark job that takes records straight from the failed enriched good directory and deletes exactly those from DynamoDB
Scala
2
star