• Stars
    star
    150
  • Rank 238,542 (Top 5 %)
  • Language
    Scala
  • Created over 9 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

JSONs -> JSON Schema

Schema Guru

[ ![Build Status] travis-image ] travis [ ![Release] release-image ] releases [ License license-image ] license

Schema Guru is a tool (CLI, Spark job and web) allowing you to derive [JSON Schemas] json-schema from a set of JSON instances process and transform it into different data definition formats.

Current primary features include:

  • deriviation of JSON Schema from set of JSON instances (schema command)
  • generation of Redshift redshift table DDL and JSONPaths file (ddl command)

Unlike other tools for deriving JSON Schemas, Schema Guru allows you to derive schema from an unlimited set of instances (making schemas much more precise), and supports many more JSON Schema validation properties.

Schema Guru is used heavily in association with Snowplow's own Snowplow snowplow, Iglu iglu and [Schema DDL] schema-ddl projects.

User Quickstart

Download the latest Schema Guru from Bintray:

$ wget http://dl.bintray.com/snowplow/snowplow-generic/schema_guru_0.6.2.zip
$ unzip schema_guru_0.6.2.zip

Assuming you have a recent JVM installed.

CLI

Schema derivation

You can use as input either single JSON file or directory with JSON instances (it will be processed recursively).

Following command will print JSON Schema to stdout:

$ ./schema-guru-0.6.2 schema {{input}}

Also you can specify output file for your schema:

$ ./schema-guru-0.6.2 schema --output {{json_schema_file}} {{input}}

You can also switch Schema Guru into NDJSON ndjson mode, where it will look for newline delimited JSONs:

$ ./schema-guru-0.6.2 schema --ndjson {{input}}

You can specify the enum cardinality tolerance for your fields. It means that all fields which are found to have less than the specified cardinality will be specified in the JSON Schema using the enum property.

$ ./schema-guru-0.6.2 schema --enum 5 {{input}}

If you know that some particular set of values can appear, but don't want to set big enum cardinality, you may want to specify predefined enum set with --enum-sets multioption, like this:

$ ./schema-guru-0.6.2 schema --enum-sets iso_4217 --enum-sets iso_3166-1_aplha-3 /path/to/instances

Currently Schema Guru includes following built-in enum sets (written as they should appear in CLI):

If you need to include very specific enum set, you can define it by yourself in JSON file with array like this:

["Mozilla Firefox", "Google Chrome", "Netscape Navigator", "Internet Explorer"]

And pass path to this file instead of enum name:

$ ./schema-guru-0.6.2 schema --enum-sets all --enum-sets /path/to/browsers.json /path/to/instances

Schema Guru will derive minLength and maxLength properties for strings based on shortest and longest strings. But this may be a problem if you process small amount of instances. To avoid this too strict Schema, you can use --no-length option.

$ ./schema-guru-0.6.2 schema --no-length /path/to/few-instances

DDL derivation

Like for Schema derivation, for DDL input may be also single file with JSON Schema or directory containing JSON Schemas.

Currently we support DDL only for [Amazon Redshift] redshift, but in future releases you'll be able to specify another with --db option.

Following command will just save Redshift (default --db value) DDL to current dir.

$ ./schema-guru-0.6.2 ddl {{input}}

If you specified as input a directory with several Self-describing JSON Schemas belonging to a single REVISION, Schema Guru will also generate a migrations. So, you can migratte any of previous tables to any of subsequent. For example, having following list of Self-describing JSON Schemas as input:

  • schemas/com.acme/click_event/1-0-0
  • schemas/com.acme/click_event/1-0-1
  • schemas/com.acme/click_event/1-0-2

You will have following migrations as output:

  • sql/com.acme/click_event/1-0-0/1-0-1 to alter table from 1-0-0 to 1-0-1
  • sql/com.acme/click_event/1-0-0/1-0-2 to alter table from 1-0-0 to 1-0-2
  • sql/com.acme/click_event/1-0-1/1-0-2 to alter table from 1-0-1 to 1-0-2

This migrations (and all subsequent table definitions) are aware of column order and it will never put a new column in the middle of table, so you can safely alter your tables while they belong to a single REVISION.

You also can specify directory for output:

$ ./schema-guru-0.6.2 ddl --output {{ddl_dir}} {{input}}

If you're not a Snowplow Platform user, don't use [Self-describing Schema] self-describing or just don't want anything specific to it you can produce raw schema:

$ ./schema-guru-0.6.2 ddl --raw {{input}}

But bear in mind that Self-describing Schemas bring many benefits. For example, raw Schemas will not preserve an order for your columns (it just impossible!) and also you will not have a migrations.

You may also want to get JSONPaths file for Redshift's [COPY] redshift-copy command. It will place jsonpaths dir alongside with sql:

$ ./schema-guru-0.6.2 ddl --with-json-paths {{input}}

The most embarrassing part of shifting from dynamic-typed world to static-typed is product types (or union types) like this in JSON Schema: ["integer", "string"]. How to represent them in SQL DDL? It's a taught question and we think there's no ideal solution. Thus we provide you two options. By default product types will be transformed as most general VARCHAR(4096). But there's another way - you can split column with product types into separate ones with it's types as postfix, for example property model with type ["string", "integer"] will be transformed into two columns mode_string and model_integer. This behavior can be achieved with --split-product-types.

Another thing everyone need to consider is default VARCHAR size. If there's no clues about it (like maxLength) 4096 will be used. You can also specify this default value:

$ ./schema-guru-0.6.2 ddl --varchar-size 32 {{input}}

You can also specify Redshift Schema for your table. For non-raw mode atomic used as default.

$ ./schema-guru-0.6.2 ddl --raw --schema business {{input}}

Some users do not full rely on Schema Guru JSON Schema derivation or DDL generation and edit their DDLs manually. By default, Schema Guru will not override your files (either DDLs and migrations) if user made any significant changes (comments and whitespaces are not significant). Instead Schema Guru will just warn user that file has been changed manually. To change this behavior you may specify --force flag.

$ ./schema-guru-0.6.2 ddl --force {{input}}

Web UI

You can access our hosted demo of the Schema Guru web UI at [schemaguru.snplowanalytics.com] webui-hosted. To run it locally:

$ wget http://dl.bintray.com/snowplow/snowplow-generic/schema_guru_webui_0.6.2.zip
$ unzip schema_guru_webui_0.6.2.zip
$ ./schema-guru-webui-0.6.2

The above will run a Spray web server containing Schema Guru on [0.0.0.0:8000] webui-local. Interface and port can be specified by --interface and --port respectively.

Apache Spark

Since version 0.4.0 Schema Guru shipping with Spark job for deriving JSON Schemas. To help users getting started with Schema Guru on Amazon Elastic MapReduce we provide pyinvoke pyinvoke tasks.py.

Recommended way to start is install all requirements and assembly fatjar as described in Developer Quickstart.

Before run you need:

  • An AWS CLI profile, e.g. my-profile
  • A EC2 keypair, e.g. my-ec2-keypair
  • At least one Amazon S3 bucket, e.g. my-bucket

To provision the cluster and start the job you need to use run_emr task:

$ cd sparkjob
$ inv run_emr my-profile my-bucket/input/ my-bucket/output/ my-bucket/errors/ my-bucket/logs my-ec2-keypair

If you need some specific options for Spark job, you can specify these in tasks.py. The Spark job accepts the same options as the CLI application, but note that --output isn't optional and we have a new optional --errors-path. Also, instead of specifying some of predefined enum sets you can just enable it with --enum-sets flag, so it has the same behaviour as --enum-sets all.

Developer Quickstart

Assuming git, [Vagrant] vagrant-install and [VirtualBox] virtualbox-install installed:

 host$ git clone https://github.com/snowplow/schema-guru.git
 host$ cd schema-guru
 host$ vagrant up && vagrant ssh
guest$ cd /vagrant
guest$ sbt assembly

Also, optional:

guest$ sbt "project schema-guru-webui" assembly
guest$ sbt "project schema-guru-sparkjob" assembly

You can also deploy the Schema Guru web GUI onto Elastic Beanstalk:

guest$ cd beanstalk && zip beanstalk.zip *

Now just create a new Docker app in the [Elastic Beanstalk Console] beanstalk-console and upload this zipfile.

User Manual

Functionality

Schema derivation

  • Takes a directory as an argument and will print out the resulting JsonSchema:
    • Processes each JSON sequentially
    • Merges all results into one master Json Schema
  • Recognizes following JSON Schema formats:
    • uuid
    • date-time (according to ISO-8601)
    • IPv4 and IPv6 addresses
    • HTTP, HTTPS, FTP URLs
  • Recoginzed minLength and maxLength properties for strings
  • Recognizes base64 pattern for strings
  • Detects integer ranges according to Int16, Int32, Int64
  • Detects misspelt properties and produce warnings
  • Detects enum values with specified cardinality
  • Detects known enum sets built-in or specified by user
  • Allows to output [Self-describing JSON Schema] self-describing
  • Allows to produce JSON Schemas with different names based on given JSON Path
  • Supports [Newline Delimited JSON] ndjson

DDL derivation

  • Correctly transforms some of string formats
    • uuid becomes CHAR(36)
    • ipv4 becomes VARCHAR(14)
    • ipv6 becomes VARCHAR(39)
    • date-time becomes TIMESTAMP
  • Handles properties with only enums
  • Property with maxLength(n) and minLength(n) becomes CHAR(n)
  • Can output JSONPaths file
  • Can split product types
  • Number with multiplyOf 0.01 becomes DECIMAL
  • Handles Self-describing JSON and can produce raw DDL
  • Recognizes integer size by minimum and maximum values
  • Object without properties, but with patternProperties becomes VARCHAR(4096)

Assumptions

  • All JSONs in the directory are assumed to be of the same event type and will be merged together
  • All JSONs are assumed to start with either { ... } or [ ... ]
    • If they do not they are discarded
  • Schema should be as strict as possible - e.g. no additionalProperties are allowed currently

Self-describing JSON

schema command allows you to produce [Self-describing JSON Schema] self-describing. To produce it you need to specify vendor, name (if segmentation isn't using, see below), and version (optional, default value is 1-0-0).

$ ./schema-guru-0.6.2 schema --vendor {{your_company}} --name {{schema_name}} --schemaver {{version}} {{input}}

Schema Segmentation

If you have set of mixed JSONs from one vendor, but with slightly different structure, like:

{ "version": 1,
  "type": "track",
  "userId": "019mr8mf4r",
  "event": "Purchased an Item",
  "properties": {
    "revenue": "39.95",
    "shippingMethod": "2-day" },
  "timestamp" : "2012-12-02T00:30:08.276Z" }

and

{ "version": 1,
  "type": "track",
  "userId": "019mr8mf4r",
  "event": "Posted a Comment",
  "properties": {
    "body": "This book is gorgeous!",
    "attachment": false },
  "timestamp" : "2012-12-02T00:28:02.273Z" }

You can run it as follows:

$ ./schema-guru-0.6.2 schema --output {{output_dir}} --schema-by $.event {{mixed_jsons_directory}}

It will put two (or may be more) JSON Schemas into output dir: Purchased_an_Item.json and Posted_a_comment.json. They will be derived from JSONs only with corresponding event property, without any intersections. Assuming that provided JSON Path contain valid string. All schemas where this JSON Path is absent or contains not a string value will be merged into unmatched.json schema in the same output dir. Also, when Self-describing JSON Schema producing, it will take schema name in the same way and --name argument can be omitted (it will replace name specified with option).

Example

Here's an example of some subtle points which a tool working with a single JSON instance would miss.

First instance:

{ "event": {
    "just_a_string": "Any string may be here",
    "sometimes_ip": "192.168.1.101",
    "always_ipv4": "127.0.0.1",
    "id": 43,
    "very_big_int": 9223372036854775102,
    "this_should_be_number": 2.1,
    "nested_object": {
        "title": "Just an nested object",
        "date": "2015-05-29T12:00:00+07:00" }}}

Second instance:

{ "event": {
    "just_a_string": "No particular format",
    "sometimes_ip": "This time it's not an IP",
    "always_ipv4": "192.168.1.101",
    "id": 42,
    "very_big_int": 92102,
    "this_should_be_number": 201,
    "not_always_here": 32,
    "nested_object": {
        "title": "Still plain string without format",
        "date": "1961-07-03T12:00:00+07:00" }}}

The generated schema:

{ "type" : "object",
  "properties" : {
    "event" : {
      "type" : "object",
      "properties" : {
        "just_a_string" : { "type" : "string" },
        "sometimes_ip" : { "type" : "string" },
        "always_ipv4" : {
          "type" : "string",
          "format" : "ipv4" },
        "id" : {
          "type" : "integer",
          "minimum" : 0,
          "maximum" : 32767 },
        "very_big_int" : {
          "type" : "integer",
          "minimum" : 0,
          "maximum" : 9223372036854775807 },
        "this_should_be_number" : {
          "type" : "number",
          "minimum" : 0 },
        "nested_object" : {
          "type" : "object",
          "properties" : {
            "title" : { "type" : "string" },
            "date" : {
              "type" : "string",
              "format" : "date-time" } },
          "additionalProperties" : false },
        "not_always_here" : {
          "type" : "integer",
          "minimum" : 0,
          "maximum" : 32767 } },
      "additionalProperties" : false } },
  "additionalProperties" : false }

Copyright and License

Schema Guru is copyright 2014-2016 Snowplow Analytics Ltd.

Licensed under the [Apache License, Version 2.0] license (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

More Repositories

1

snowplow

The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
Scala
6,684
star
2

snowplow-javascript-tracker

Snowplow event tracker for client-side and server-side JavaScript. Add analytics to your websites, web apps and servers.
TypeScript
531
star
3

factotum

A system to programmatically run data pipelines
Rust
217
star
4

iglu

Iglu is a machine-readable, open-source schema repository for JSON Schema from the team at Snowplow
Shell
205
star
5

ansible-playbooks

Ansible playbooks to install common platforms and tools (e.g. JVM, Ruby, Postgres etc.)
Shell
175
star
6

snowplow-mini

An easily-deployable, single-instance version of Snowplow
Go
120
star
7

spark-example-project

A Spark WordCountJob example as a standalone SBT project with Specs2 tests, runnable on Amazon EMR
Scala
118
star
8

iglu-central

Contains all JSON Schemas, Avros and Thrifts for Iglu Central
Shell
112
star
9

snowplow-android-tracker

Snowplow event tracker for Android. Add analytics to your Android apps and games
Kotlin
104
star
10

aws-lambda-nodejs-example-project

An AWS Lambda function in Node.js reading events from Amazon Kinesis and writing event counts to DynamoDB
JavaScript
101
star
11

spark-streaming-example-project

A Spark Streaming job reading events from Amazon Kinesis and writing event counts to DynamoDB
Scala
94
star
12

scala-maxmind-iplookups

Scala client for MaxMind Geo-IP
Scala
86
star
13

scalding-example-project

The Scalding WordCountJob example as a standalone SBT project with Specs2 tests, runnable on Amazon EMR
Scala
82
star
14

sql-runner

Run templatable playbooks of SQL scripts in series and parallel on Redshift, PostgreSQL, BigQuery and Snowflake
Go
79
star
15

snowplow-ios-tracker

Snowplow event tracker for Swift and Objective-C. Add analytics to your iOS, macOS, tvOS and watchOS apps and games
Swift
76
star
16

snowplow-docker

Docker images for Snowplow, Iglu and associated projects
Dockerfile
61
star
17

snowplow-web-data-model

SQL data model for working with Snowplow web data. Supports Redshift and Looker. Snowflake and BigQuery coming soon
LookML
61
star
18

aws-lambda-scala-example-project

An AWS Lambda function in Scala reading events from Amazon Kinesis and writing event counts to DynamoDB
Scala
57
star
19

chrome-snowplow-inspector

Web Extension for debugging Snowplow pixels.
TypeScript
49
star
20

dbt-snowplow-web

A fully incremental model, that transforms raw web event data generated by the Snowplow JavaScript tracker into a series of derived tables of varying levels of aggregation.
Shell
47
star
21

scala-forex

High-performance Scala library for performing exchange rate lookups and currency conversions
Scala
45
star
22

scala-weather

High-performance Scala library for looking up the weather
Scala
45
star
23

snowplow-python-tracker

Snowplow event tracker for Python. Add analytics to your Python and Django apps, webapps and games
Python
42
star
24

snowplow-s3-loader

Mirrors a Kinesis stream to Amazon S3 using the KCL
Scala
41
star
25

data-models

⚠️ MAINTENANCE-ONLY MODE: Snowplow maintained SQL data models for working with Snowplow web and mobile behavioral data.
PLpgSQL
41
star
26

snowplow-php-tracker

Snowplow event tracker for PHP. Add analytics into your PHP apps and scripts
PHP
34
star
27

snowplow-rdb-loader

Stores Snowplow enriched events in Redshift, Snowflake and Databricks
Scala
31
star
28

snowplow-react-native-tracker

Snowplow event tracker for react-native apps
TypeScript
30
star
29

google-cloud-dataflow-example-project

Example stream processing job, written in Scala with Apache Beam, for Google Cloud Dataflow
Scala
29
star
30

stream-collector

Collector for cloud-native web, mobile and event analytics, running on AWS and GCP
Scala
26
star
31

snowplow-golang-tracker

Snowplow event tracker for Golang. Add analytics to your Go apps and servers
Go
25
star
32

snowplow-nodejs-tracker

Snowplow event tracker for Node.js. Add analytics to your JavaScript apps, node-webkit projects and Node.js servers
TypeScript
24
star
33

snowplow-java-tracker

Snowplow event tracker for Java. Add analytics to your Java desktop and server apps, servlets and games. (See also: snowplow-android-tracker)
Java
24
star
34

kinesis-example-scala-consumer

Example Scala/SBT event consumer for Amazon Kinesis
Scala
22
star
35

kinesis-example-scala-producer

Example Scala/SBT event producer for Amazon Kinesis
Scala
21
star
36

snowplow-dotnet-tracker

Snowplow event tracker for .NET. Add analytics to your ASP.NET, C#, F# and Visual Basic apps, servers and games
C#
21
star
37

snowplow-python-analytics-sdk

Python SDK for working with Snowplow enriched events in Spark, AWS Lambda et al.
Python
21
star
38

snowplow-ruby-tracker

Snowplow event tracker for Ruby. Add analytics to your Ruby and Rails apps and gems
Ruby
21
star
39

enrich

Snowplow Enrichment jobs and library
Scala
20
star
40

snowplow-scala-analytics-sdk

Scala SDK for working with Snowplow enriched events in Spark, AWS Lambda, Flink et al.
Scala
20
star
41

quickstart-examples

Examples of how to automate creating a Snowplow Community Edition pipeline
HCL
19
star
42

dataflow-runner

Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR
Go
19
star
43

cloudfront-log-deserializer

A Hive Deserializer for CloudFront access logs (supports download distribution files only)
Java
17
star
44

snowplow-unity-tracker

Snowplow event tracker for Unity. Add analytics to your Unity games and apps
C#
16
star
45

iglu-example-schema-registry

Example static schema registry for Iglu
15
star
46

avalanche

Load testing for event analytics platforms (Snowplow, more coming soon)
Scala
13
star
47

kinesis-tee

Unix tee, but for Kinesis streams
Scala
12
star
48

iglu-server

A RESTful schema registry
Scala
12
star
49

snowplowanalytics.com

The Snowplow website
HTML
12
star
50

snowbridge

For replicating streams across clouds, accounts and regions
Go
12
star
51

documentation

Snowplow Documentation Website
JavaScript
11
star
52

dbt-snowplow-mobile

A fully incremental model, that transforms raw mobile event data generated by the Snowplow mobile trackers into a series of derived tables of varying levels of aggregation.
Shell
11
star
53

dev-environment

Vagrant-based Snowplow development environment with Ansible playbooks to install common tools
Shell
11
star
54

snowplow-elasticsearch-loader

Writes Snowplow enriched events from Kinesis to Elasticsearch
Scala
11
star
55

factotum-server

Rust
10
star
56

dbt-snowplow-fractribution

Snowplow Fractribution (marketing attribution) model for dbt
Python
9
star
57

dbt-snowplow-utils

Snowplow utility functions to be used in conjunction with the snowplow-web dbt package.
PLpgSQL
9
star
58

r-data-science-environment

VM with complete R (RStudio) environment
Shell
9
star
59

snowplow-tracking-cli

Command-line app for tracking Snowplow events. Add analytics to your shell scripts and terminal sessions
Go
8
star
60

snowplow-cpp-tracker

Snowplow event tracker for C++. Add analytics to your C++ applications, games and servers
C++
8
star
61

igluctl

A command-line tool for working with Iglu schema registries
Scala
8
star
62

snowplow-scala-tracker

Snowplow event tracker for Scala. Add analytics to your Scala, Akka and Play apps and servers
Scala
8
star
63

snowplow-badrows

Scala
7
star
64

release-manager

Uploads zipfiles to Bintray and creates versions
Python
7
star
65

snowplow-gtm-server-side-client

A Google Tag Manager Server-side Client template for collecting events using the Snowplow JavaScript Tracker
Smarty
7
star
66

snowplow-rust-tracker

Rust
7
star
67

snowplow-arduino-tracker

Snowplow event tracker for Arduino. Add analytics to sketches on IP-connected Arduino boards
C++
7
star
68

dbt-snowplow-media-player

A fully incremental model, that transforms media player event data generated by the Snowplow JavaScript tracker into derived tables for easier querying
Shell
6
star
69

dbt-snowplow-ecommerce

A fully incremental model, that transforms raw ecommerce event data generated by the Snowplow JavaScript tracker into a series of derived tables representing various ecommerce data objects.
Shell
6
star
70

snowplow-looker-demo

LookML for the Snowplow Looker demo
LookML
5
star
71

snowplow-omniture-ingest

Ingests Omniture data (exported as log files) into SnowPlow for more involved analysis
5
star
72

schema-ddl

ASTs and generators for producing various DDL and Schema formats
Scala
5
star
73

iglu-scala-client

Scala client for Iglu schema registry
Scala
5
star
74

iab-spiders-and-robots-java-client

Java 8+ client library for the IAB and ABC International Spiders and Robots list
Java
5
star
75

samza-scala-example-project

An Apache Samza stream processing job written in Scala
Scala
5
star
76

snowplow-dotnet-analytics-sdk

C#
4
star
77

beam-enrich

Dataflow job reading tracked events from PubSub, validating and enriching them and writing them back to PubSub
Scala
4
star
78

looker-snowplow-web

A LookML block, that uses data from the Snowplow JavaScript tracker and Web Data Model derived tables and makes it available for exploration in Looker.
LookML
4
star
79

snowplow-aws-lambda-source

Sends Amazon S3 object operations into Snowplow, implemented as an AWS Lambda
4
star
80

snowplow-gtm-custom-template

GTM Custom Template for the Snowplow JavaScript Tracker (v2)
Smarty
4
star
81

snowplow-golang-analytics-sdk

Golang Analytics SDK for working with Snowplow enriched events in cloud functions and other Go applications.
Go
4
star
82

snowplow-gtm-server-side-tag

A Google Tag Manager Server-side Tag template for sending events to a Snowplow Collector
Smarty
4
star
83

snowplow-actionscript3-tracker

Snowplow event tracker for ActionScript 3.0. Add analytics to your Flash Player 9+, Flash Lite 4 and AIR games, apps and widgets
ActionScript
4
star
84

iglu-ruby-client

Ruby and JRuby client for Iglu
Ruby
3
star
85

snowplow-gtm-server-side-amplitude-tag

A Google Tag Manager Server-side Amplitude Tag template for send events to the Amplitude HTTP API v2
Smarty
3
star
86

marketing-attribution-accelerator

A Snowplow accelerator which describes how to do marketing attribution with Snowplow
Shell
3
star
87

spark-data-science-environment

VM with Spark ready-to-go
Shell
3
star
88

snowplow-lua-tracker

Snowplow event tracker for Lua. Add analytics to your Lua apps and Lua-scripted games
Lua
3
star
89

neo4j-data-science-environment

VM with Neo4j installed
Shell
3
star
90

sp-js-assets

Contains all of the Snowplow JavaScript Tracker assets.
JavaScript
3
star
91

scala-serf-client

Minimal wrapper around https://github.com/tv2norge/java-serf-client
Scala
3
star
92

advanced-analytics-web-accelerator

Tutorial and visualisations showing how to instrument web analytics with Snowplow
Shell
3
star
93

python-data-science-environment

Shell
3
star
94

composable-cdp-with-predictive-ml-modeling-accelerator

A composable CDP accelerator using Snowplow, Databricks & Hightouch
HTML
3
star
95

snowplow-scala-project.g8

Shell
3
star
96

mobile-hybrid-apps-accelerator

Tutorial and demo apps showing how to instrument hybrid mobile apps with Snowplow tracking
Shell
3
star
97

hive-example-udf

Java
3
star
98

makefile-rs

WIP Rust crate for parsing extremely simple Makefiles
Rust
2
star
99

advanced-analytics-mobile-accelerator

Tutorial and visualisations showing how to instrument mobile analytics with Snowplow
Shell
2
star
100

scala-util

Reusable Scala code from Snowplow Analytics
Scala
2
star