• Stars
    star
    5,730
  • Rank 7,113 (Top 0.2 %)
  • Language
    Rust
  • License
    Other
  • Created almost 6 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The Cloud Operational Data Store: use SQL to transform, deliver, and act on fast-changing data.

Build status Doc reference Chat on Slack

Materialize is a data warehouse purpose-built for operational workloads where an analytical data warehouse would be too slow, and a stream processor would be too complicated.

Using SQL and common tools in the wider data ecosystem, Materialize allows you to build real-time automation, engaging customer experiences, and interactive data products that drive value for your business while reducing the cost of data freshness.

Sign up

Ready to try out Materialize? Sign up to get started! πŸš€

About

Materialize is designed to help you interactively explore your streaming data, perform analytics against live relational data, or increase data freshness while reducing the load of your dashboard and monitoring tasks. The moment you need a refreshed answer, you can get it in milliseconds.

It focuses on providing correct and consistent answers with minimal latency, and does not ask you to accept either approximate answers or eventual consistency. Whenever Materialize answers a query, that answer is the correct result on some specific (and recent) version of your data. Materialize does all of this by recasting your SQL queries as dataflows, which can react efficiently to changes in your data as they happen.

We support a large fraction of PostgreSQL, and are actively working on supporting more built-in PostgreSQL functions. Please file an issue if something doesn't work as expected!

Get data in

Materialize can read data from Kafka (and other Kafka API-compatible systems like Redpanda), directly from a PostgreSQL replication stream, or from SaaS applications via webhooks. It also supports regular database tables to which you can insert, update, and delete rows.

Transform, manipulate, and read your data

Once you've got the data in, define views and perform reads via the PostgreSQL protocol. Use your favorite SQL client, including the psql you probably already have on your system.

Materialize supports a comprehensive variety of SQL features, all using the PostgreSQL dialect and protocol:

  • Joins, Joins, Joins! Materialize supports multi-column join conditions, multi-way joins, self-joins, cross-joins, inner joins, outer joins, etc.
  • Delta-joins avoid intermediate state blowup compared to systems that can only plan nested binary joins - tested on joins of up to 64 relations.
  • Support for subqueries. Materialize's SQL optimizer performs subquery decorrelation out-of-the-box, avoiding the need to manually rewrite subqueries into joins.
  • Materialize supports streams that contain CDC data (currently supporting the Debezium format). Materialize can incrementally maintain views in the presence of arbitrary inserts, updates, and deletes. No asterisks.
  • All the aggregations. GROUP BY , MIN, MAX, COUNT, SUM, STDDEV, HAVING, etc.
  • ORDER BY
  • LIMIT
  • DISTINCT
  • JSON support in the PostgreSQL dialect including operators and functions like ->, ->>, @>, ?, jsonb_array_element, jsonb_each. Materialize automatically plans lateral joins for efficient jsonb_each support.
  • Nest views on views on views!
  • Multiple views that have overlapping subplans can share underlying indices for space and compute efficiency, so just declaratively define what you want, and we'll worry about how to efficiently maintain them.

Just show us what it can do!

Here's an example join query that works fine in Materialize, TPC-H query 15:

-- Views define commonly reused subqueries.
CREATE VIEW revenue (supplier_no, total_revenue) AS
    SELECT
        l_suppkey,
        SUM(l_extendedprice * (1 - l_discount))
    FROM
        lineitem
    WHERE
        l_shipdate >= DATE '1996-01-01'
        AND l_shipdate < DATE '1996-01-01' + INTERVAL '3' month
    GROUP BY
        l_suppkey;

-- The MATERIALIZED keyword is the trigger to begin
-- eagerly, consistently, and incrementally maintaining
-- results that are stored directly in durable storage.
CREATE MATERIALIZED VIEW tpch_q15 AS
  SELECT
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
FROM
    supplier,
    revenue
WHERE
    s_suppkey = supplier_no
    AND total_revenue = (
        SELECT
            max(total_revenue)
        FROM
            revenue
    )
ORDER BY
    s_suppkey;

-- Creating an index keeps results always up to date and in memory.
-- In this example, the index will allow for fast point lookups of
-- individual supply keys.
CREATE INDEX tpch_q15_idx ON tpch_q15 (s_suppkey);

Stream inserts, updates, and deletes on the underlying tables (lineitem and supplier), and Materialize keeps the materialized view incrementally updated. You can type SELECT * FROM tpch_q15 and expect to see the current results immediately!

Get data out

Pull based: Use any PostgreSQL-compatible driver in any language/environment to make SELECT queries against your views. Tell them they're talking to a PostgreSQL database, they don't ever need to know otherwise.

Push based: Listen to changes directly using SUBSCRIBE or configure Materialize to stream results to a Kafka topic as soon as the views change.

If you want to use an ORM, chat with us. They're surprisingly tricky.

Documentation

Check out our documentation.

License

Materialize is source-available and licensed under the BSL 1.1, converting to the open-source Apache 2.0 license after 4 years. As stated in the BSL, Materialize is free forever on a single node.

Materialize is also available as a paid cloud service with additional features such as high availability via multi-active replication.

Materialize depends upon many open source Rust crates. We maintain a list of these crates and their licenses, including links to their source repositories.

For developers

Materialize is primarily written in Rust.

Developers can find docs at doc/developer, and Rust API documentation is hosted at https://dev.materialize.com/api/rust/. The Materialize development roadmap is divided up into roughly month-long milestones, and managed in GitHub.

Contributions are welcome. Prospective code contributors might find the good first issue tag useful. We value all contributions equally, but bug reports are more equal.

Credits

Materialize is lovingly crafted by a team of developers and one bot. Join us.

More Repositories

1

datagen

Generate authentic looking mock data based on a SQL, JSON or Avro schema and produce to Kafka in JSON or Avro format.
TypeScript
142
star
2

mz-hack-day-2022

Official repo for the Materialize + Redpanda + dbt Hack Day 2022, including a sample project to get everyone started!
Python
61
star
3

demos

Demos of Materialize, the operational data warehouse.
TypeScript
50
star
4

sqlparser

The Materialize SQL parser
Rust
29
star
5

pulumi-docker-buildkit

Pulumi provider for Docker using Buildkit
Go
25
star
6

advent-of-code-2023

Solving the Advent of Code 2023 using nothing but Materialize, SQL and our bare hands. πŸŽ„
20
star
7

tb

Tail the Binlog of a database
Java
15
star
8

rust-protobuf-native

Rust build system integration for protobuf, Google's data interchange format.
C++
14
star
9

mzcli

materialize command line interface with autocompletion
Python
13
star
10

homebrew-crosstools

Cross-compiling toolchains for macOS.
Ruby
13
star
11

ecommerce-demo

Demonstration of using Materialize in the context of an e-commerce business to power real-time dashboards and features.
Python
12
star
12

dbt-materialize

Materialize plugin for dbt
Python
12
star
13

metabase-materialize-driver

Metabase driver and plugin for Materialize
Clojure
12
star
14

k8s-eip-operator

Rust
11
star
15

terraform-provider-materialize

A Terraform provider for Materialize
Go
11
star
16

rust-dec

libdecnumber bindings for the Rust programming language
C
10
star
17

kubernetes-stubs

Python type stubs for the Kubernetes API client.
Python
7
star
18

materialize-dbt-utils

Utility functions for dbt projects running on Materialize
Makefile
7
star
19

pulumi-fivetran

A Pulumi provider for the Fivetran ETL platform.
Go
6
star
20

rust-orb-billing

An async Rust API client for the Orb billing platform.
Rust
6
star
21

connection-examples

Materialize connection examples
TypeScript
6
star
22

k8s-controller

Rust
5
star
23

homebrew-materialize

Homebrew tap for Materialize, the streaming data warehouse
Ruby
4
star
24

rust-frontegg

A Rust API client for the Frontegg user management service.
Rust
4
star
25

rust-krb5-src

Rust build system integration for libkrb5, MIT's Kerberos implementation
C
4
star
26

pgjdbc

Materialize fork of pgjdbc
Java
4
star
27

rust-sasl

Cyrus SASL bindings for Rust
C
4
star
28

terraform-aws-msk-privatelink

A Terraform module for configuring Kafka with AWS PrivateLink.
HCL
4
star
29

pulumi-frontegg

Pulumi provider for the Frontegg user management platform.
Go
4
star
30

mz-hack-day-july-2023

3
star
31

vscode-extension

A VSCode extension to interact with Materialize
TypeScript
3
star
32

pulumi-materialize

[WIP] Pulumi provider for Materialize.
Go
3
star
33

chbenchmark

C++
2
star
34

pulumi-kubernetes-proxy

Go
2
star
35

mztrace-explorer

Explorer for Materialize query traces
JavaScript
2
star
36

cloud-sdks

SDKs for Materialize Cloud
Python
1
star
37

docker-sccache

A Docker image for sccache, the distributed build cache
Dockerfile
1
star
38

katacoda

Shell
1
star
39

pulumi-honeycomb

Honeycomb provider for pulumi
Go
1
star
40

terraform-provider-mzcloud

A Terraform provider for the Materialize Cloud platform
Go
1
star
41

terraform-aws-kafka-privatelink

HCL
1
star
42

external-dns

Fork of https://github.com/kubernetes-sigs/external-dns for using https://github.com/kubernetes-sigs/external-dns/pull/2609 before it's merged.
Go
1
star
43

uid-buildkite-plugin

A Buildkite plugin to expose agent's UID and GID
1
star
44

toolchains

Toolchains used for building Materialize
Shell
1
star
45

terraform-aws-rds-privatelink

A Terraform module for configuring RDS with AWS PrivateLink.
HCL
1
star