Optimized Analytics Package for Spark Platform (OAP) (@oap-project)

Top repositories

1

gluten

Gluten: Plugin to Double SparkSQL's Performance
Scala
920
star
2

raydp

RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
Python
311
star
3

gazelle_plugin

Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
Scala
256
star
4

Gluten-Trino

Gluten: Plugin to Boost Trino's Performance
Java
69
star
5

sql-ds-cache

Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.
Scala
33
star
6

oap-mllib

Optimized Spark package to accelerate machine learning algorithms in Apache Spark MLlib.
Scala
20
star
7

remote-shuffle

Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-disks.
Scala
20
star
8

oap-tools

Tools for building, packaging, and OAP public cloud integrations such as AWS EMR, Google Dataproc and K8S.
Jupyter Notebook
16
star
9

pmem-shuffle

Spark* Shuffle plugin for support shuffling through remote persistent memory over fabrics, which leverages the RDMA network and remote persistent memory (for read) to provide extremely high performance and low latency shuffle solutions for Spark*.
C++
14
star
10

cloudtik

Cloud scale platform for distributed analytics and AI
Python
9
star
11

pmem-spill

Spark plug-in package for accelerating Spark runtime spill functions using PMem such as RDD cache PMem extension.
Scala
7
star
12

arrow-data-source

Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.
Scala
6
star
13

text2sql-gluten

Python
5
star
14

pmem-common

Common library for accessing PMEM native library functions including memkind, vmemcache and so on.
Java
3
star
15

recdp

Python
2
star