• Stars
    star
    113
  • Rank 298,966 (Top 7 %)
  • Language
  • Created over 6 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Readings in Stream Processing

Readings in Stream Processing

A list of articles that are essential to understand stream processing.

Books

Programming Models for Stream Processing

Table Catalog for Stream Processing

Incremental Processing in DBMS

Watermark Management for Stream Processing

Workload Optimization

  • Towards a Learning Optimizer for Shared Clouds (VLDB 2019). Estimate cardinality models from the previous job executions in order to optimize the overall workloads. This work uses the multi-layer perceptron (MLP) neural network for learning models from query exeuction features (e.g., job name, input cardinality, average row length, input dataset names, etc.)
  • CrocodileDB: Efficient Database Execution through Intelligent Deferment (CIDR 2020) This paper introduces Intermittent Query Processing (IQP) approach for utilizing the knowledge about new data, query semantics, and users' expectation together to reduce the overall processing cost. It uses Deep Q-Materialization (DQM) to make a tradeoff under a certain resource constraint (e.g., memory, CPUs, storage) to decide how much data will be cached, pre-computed, pre-loaded, etc.
  • Peregrine: Workload Optimization for Cloud Query Engines (SOCC 2019) Analyzing the workload of historical queries and optimize recurrring queries, similar queries, and coordinating queries by extracing common subexpressions that can be materialized. To support various query engines including Spark, Microsoft has creaetd a common intermediate representation (IR) of workloads.

Iterative Data Processing

Incremental Processing with Materialized Views

Stream Log Collection Systems

Real-Time Stream Processing

Real-time stream processing usually means ultra-low latency applications to satisfy SLAs for returning results in a few seconds.

Stream SQL

GitHub Projects

Commercial Services

Stream Ingestion

External Lists

More Repositories

1

sqlite-jdbc

SQLite JDBC Driver
Java
2,657
star
2

snappy-java

Snappy compressor/decompressor for Java
Java
1,000
star
3

sbt-pack

A sbt plugin for creating distributable Scala packages.
Scala
487
star
4

larray

Large off-heap arrays and mmap files for Scala and Java
Scala
400
star
5

sbt-sonatype

A sbt plugin for publishing Scala/Java projects to the Maven central.
Scala
320
star
6

silk

Simplify SQL Workflows with Scala
CSS
38
star
7

scala-cookbook

Tutorial of the Scala Programming Language
CSS
29
star
8

sbt-sql

A sbt plugin for generating useful Scala case classes from SQL files
Scala
29
star
9

presto-metrics

Presto metric collection library for Ruby
Ruby
26
star
10

xerial

Data management utilities for Scala
Scala
18
star
11

jnuma

A Java library for accessing NUMA (Non Uniform Memory Access) API
C
17
star
12

scala-min

A minimal project template to start programming with Scala
Shell
13
star
13

dp-readings

Readings in Differential Privacy
11
star
14

sbt-jcheckstyle

A sbt plugin for checking Java code styles
Shell
6
star
15

chroniker

Simplify your batch job pipelines with Scala
Scala
4
star
16

fluentd-standalone

Standalone fluentd server for Java/Scala
Shell
4
star
17

genome-weaver-align

Toolkit for genome sciences
Java
3
star
18

xerial-java

Xerial library for Java
Java
3
star
19

scalajs-selenium

Scala.js + Selenium setup example
Shell
1
star
20

msgframe

A framework for SQL-based message processing sql
Scala
1
star
21

scala-steward-repos

My repository list maintained with Scala Steward
1
star
22

xerial.github.com

Xerial Web Site
HTML
1
star
23

zstd-java

Zstandard (zstd) compressor/decompressor for Java
Makefile
1
star