• Stars
    star
    245
  • Rank 164,220 (Top 4 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created about 2 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Cloud Shuffle Service(CSS) is a general purpose remote shuffle solution for compute engines, including Spark/Flink/MapReduce.

English | 简体中文

Cloud Shuffle Service

GitHub license

Cloud Shuffle Service(CSS) is a general purpose remote shuffle solution for compute engines, including Spark/Flink/MapReduce. It provides reliable, high-performance, and elastic data shuffling capabilities to these compute engines. Shuffled data is pushed to the CSS cluster, stored in disk or HDFS, and can be fetched from the CSS cluster by compute engines.

CSS Architecture

  • CSS Worker

    Stores shuffled data that is pushed from map tasks in memory and persist them to file system asynchronously, allowing reduce tasks to subsequently fetch data from CSS workers.

  • CSS Master

    CSS Master is a coordinator component for an application's shuffle process, and is integrated into the application. It reads CSS worker list from ZooKeeper and assigns them to the application to do shuffling, tracks the progress of running map tasks, and then notifies CSS workers to commit files after all the map tasks have finished.

  • CSS Client

    Map/Reduce task use CSS client to push/fetch shuffled data to the assigned CSS workers.

Building CSS

mvn build

CSS is built using Apache Maven. Building CSS using Maven requires Java 8 and either Scala 2.12 or Scala 2.11.

mvn -DskipTests clean package

Building a Runnable Distribution

To create a CSS distribution, use ./build.sh in the project root directory.

./build.sh

It generates a tgz package, you can copy it to the nodes you want to deploy CSS.

css-1.0.0-bin
├── LICENSE
├── README.md
├── client
├── conf
├── docs
├── lib  // CSS cluster lib
└── sbin

Deploy CSS Cluster

CSS provides two deployment modes, standalone and zookeeper mode. The standalone mode is currently only for testing, while the zookeeper mode is used in the production environment.

  1. Place the above built CSS tgz file on each node of the Cluster.
  2. Unpack it to a dir, which can be set to CSS_HOME environment, all default conf, metrics and workers list can be found in the $CSS_HOME/conf directory.
  3. Update $CSS_HOME/sbin/css-config.sh.
    # standalone mode
    CSS_MASTER_HOST=<HOST_IP>
    MASTER_JAVA_OPTS="-Xmx8192m"
    WORKER_JAVA_OPTS="-Xmx8192m -XX:MaxDirectMemorySize=100000m"
     
    # zookeeper mode
    WORKER_JAVA_OPTS="-Xmx8192m -XX:MaxDirectMemorySize=100000m"
    
  4. Update $CSS_HOME/conf/css-defaults.conf
    css.cluster.name = <css cluster name>
    
    # standalone(for testing) or zookeeper(for production)
    css.worker.registry.type = zookeeper
    # only for zookeeper mode
    css.zookeeper.address = <ip1>:<port1>,<ip2>:<port2>,<ip3>:<port3>
    
    # css worker common conf
    css.flush.queue.capacity = 4096
    css.flush.buffer.size = 128k
    css.network.timeout = 600s
    css.epoch.rotate.threshold = 1g
    css.push.io.numConnectionsPerPeer = 8
    css.push.io.threads = 128
    css.replicate.threads = 128
    css.fetch.io.threads = 64
    css.fetch.chunk.size = 4m
    css.shuffle.server.chunkFetchHandlerThreadsPercent = 400
    
    # hdfs storage
    css.hdfsFlusher.base.dir = hdfs://xxx
    css.hdfsFlusher.num = -1
    css.hdfsFlusher.replica = 2    
    
    # local disk storage
    css.diskFlusher.base.dirs = /data00/css,/data01/css
    css.disk.dir.num.min = 1
    
  5. Define your metrics and worker node host in the following files:
    $CSS_HOME/conf/css-metrics.properties
    $CSS_HOME/conf/workers
    
  6. Sync all the updated config files to each node of the Cluster.
  7. Start the CSS Cluster Shuffle workers. The script will ssh into each css worker node and start the Workers.
    # standalone mode
    cd $CSS_HOME;bash ./sbin/start-all.sh
    # zookeeper mode
    cd $CSS_HOME;bash ./sbin/start-workers.sh
    

Running with Spark

  1. Copy $CSS_HOME/client/spark-${version}/*.jar to $SPARK_HOME/jars/ .
  2. Run spark with CSS.
    # standalone mode
    --conf spark.css.cluster.name=<css cluster name> \
    --conf spark.css.master.address=css://<masterIp>:<masterPort>\
    --conf spark.shuffle.manager=org.apache.spark.shuffle.css.CssShuffleManager\
    
    # zookeeper mode
    --conf spark.css.cluster.name=<css cluster name> \
    --conf spark.css.zookeeper.address="<ip1>:<port1>,<ip2>:<port2>,<ip3>:<port3>" \
    --conf spark.shuffle.manager=org.apache.spark.shuffle.css.CssShuffleManager\
    

Spark Adaptive Query Execution Support

CSS supports all the features of AQE. To support skew join optimization, it is necessary to patch the file to Spark and re-build Spark.

./patch/spark-3.0-aqe-skewjoin.patch

Configuration

CSS Cluster Server

All detailed configuration can be found in the CssConf class.

Property Name Default Meaning
css.cluster.name - The cluster name for the CSS cluster.
css.worker.registry.type standalone The worker registry type (e.g. standalone, zookeeper). This will also specify if CSS will run under Standalone or zookeeper mode.
css.zookeeper.address - (For zookeeper mode) The CSS zookeeper address.
css.push.io.threads 32 The CSS Threads for netty push data io.
css.fetch.io.threads 32 The CSS Threads for netty fetch data io.
css.commit.threads 128 The CSS Threads for stage end to close partition file.
css.diskFlusher.base.dirs /tmp/css The CSS Disk Base dirs (e.g. /data00/css,/data01/css).
css.hdfsFlusher.base.dir - The CSS HDFS Base dir (e.g. hdfs://xxx).

CSS Spark Client

Property Name Default Meaning
css.max.allocate.worker 1000 The Maximum number of workers requested for shuffling.
css.worker.allocate.extraRatio 1.5 The application can allocate additional workers controlled by this extra ratio, the final number will be calculated with Min(Max(2, targetWorker), MaxAllocateWorker).
css.backpressure.enabled true The back pressure control, when enabled, it will use Gradient2Limit to control push data rate, otherwise use FixedLimit.
css.fixRateLimit.threshold 64 Fixed Rate for the back pressure control.
css.data.io.threads 8 The Maximum client side data sending for netty thread.
css.maxPartitionsPerGroup 100 The Maximum number of partitions per group, each data push will send one group at a time.
css.partitionGroup.push.buffer.size 4m The Maximum buffer size sent per each data push, in the same format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") (e.g. 512m, 2g).
css.client.mapper.end.timeout 600s The Maximum timeout to wait for all data to be sent before mapTask ends.
css.stage.end.timeout 600s The Maximum timeout to wait for all partition files to close.
css.sortPush.spill.record.threshold 1000000 The Maximum records for sending data.
css.sortPush.spill.size.threshold 256m The Maximum size for sending data, in the same format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") (e.g. 512m, 2g).
css.shuffle.mode DISK Choose which storage mode to use (e.g. DISK, HDFS).
css.epoch.rotate.threshold 1g The file auto rotate switch threshold size for new files, in the same format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") (e.g. 512m, 2g).
css.client.failed.batch.blacklist.enabled true When MapTask encounters onFailure, the current reduceId-epochId-mapId-mapAttemptId-batchId will be recorded into the blacklist. In AE skewjoin mode, this switch must be turned on, otherwise there will be correctness problems.
css.compression.codec lz4 It is recommended to use zstd compression mode. Compared with lz4, it can improve the compression ratio by 30%, and only consume an additional 8% of performance.

Contribution

Please check Contributing for more details.

Code of Conduct

Please check Code of Conduct for more details.

Security

If you discover a potential security issue in this project, or think you may have discovered a security issue, we ask that you notify Bytedance Security via our security center or vulnerability reporting email.

Please do not create a public GitHub issue.

License

This project is licensed under the Apache-2.0 License.

More Repositories

1

IconPark

🍎Transform an SVG icon into multiple themes, and generate React icons,Vue icons,svg icons
TypeScript
8,203
star
2

xgplayer

A HTML5 video player with a parser that saves traffic
JavaScript
8,172
star
3

sonic

A blazingly fast JSON serializing & deserializing library
Assembly
6,626
star
4

monoio

Rust async runtime based on io-uring.
Rust
3,864
star
5

byteps

A high performance and generic framework for distributed DNN training
Python
3,603
star
6

lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation
C++
3,168
star
7

ByteX

ByteX is a bytecode plugin platform based on Android Gradle Transform API and ASM. 字节码插件开发平台
Java
2,865
star
8

AlphaPlayer

AlphaPlayer is a video animation engine.
Java
2,181
star
9

Elkeid

Elkeid is an open source solution that can meet the security requirements of various workloads such as hosts, containers and K8s, and serverless. It is derived from ByteDance's internal best practices.
Go
2,141
star
10

bhook

🔥 ByteHook is an Android PLT hook library which supports armeabi-v7a, arm64-v8a, x86 and x86_64.
C
2,058
star
11

flutter_ume

UME is an in-app debug kits platform for Flutter. Produced by Flutter Infra team of ByteDance
Dart
2,053
star
12

terarkdb

A RocksDB compatible KV storage engine with better performance
C++
2,038
star
13

btrace

🔥🔥 btrace(AKA RheaTrace) is a high performance Android trace tool which is based on Perfetto, it support to define custom events automatically during building apk and using bhook to provider more native events like Render/Binder/IO etc.
Kotlin
1,886
star
14

gopkg

Universal Utilities for Go
Go
1,676
star
15

android-inline-hook

🔥 ShadowHook is an Android inline hook library which supports thumb, arm32 and arm64.
C
1,637
star
16

bitsail

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Java
1,611
star
17

go-tagexpr

An interesting go struct tag expression syntax for field validation, etc.
Go
1,470
star
18

GiantMIDI-Piano

Python
1,431
star
19

appshark

Appshark is a static taint analysis platform to scan vulnerabilities in an Android app.
Kotlin
1,363
star
20

AabResGuard

The tool of obfuscated aab resources.(Android app bundle资源混淆工具)
Java
1,307
star
21

piano_transcription

Python
1,247
star
22

CodeLocator

Kotlin
1,163
star
23

BoostMultiDex

BoostMultiDex is a solution for quickly loading multiple dex files on low Android version devices (4.X and below, SDK <21).
Java
1,106
star
24

music_source_separation

Python
1,039
star
25

Fastbot_Android

Fastbot(2.0) is a model-based testing tool for modeling GUI transitions to discover app stability problems
C++
1,031
star
26

memory-leak-detector

C
919
star
27

fedlearner

A multi-party collaborative machine learning framework
Python
890
star
28

SALMONN

SALMONN: Speech Audio Language Music Open Neural Network
Python
886
star
29

monolith

ByteDance's Recommendation System
Python
844
star
30

sonic-cpp

A fast JSON serializing & deserializing library, accelerated by SIMD.
C++
811
star
31

godlp

sensitive information protection toolkit
Go
770
star
32

MVDream

Multi-view Diffusion for 3D Generation
Python
744
star
33

res-adapter

Official implementation of "ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models".
Python
724
star
34

bytemd

ByteMD v1 repository
TypeScript
679
star
35

tailor

C
669
star
36

ibot

iBOT 🤖: Image BERT Pre-Training with Online Tokenizer (ICLR 2022)
Jupyter Notebook
663
star
37

RealRichText

A Tricky Solution for Implementing Inline-Image-In-Text Feature in Flutter.
Dart
658
star
38

guide

A new feature guide component by react 🧭
TypeScript
651
star
39

mockey

a simple and easy-to-use golang mock library
Go
603
star
40

magic-microservices

Make Web Components easier and powerful!😘
TypeScript
570
star
41

Fastbot_iOS

About Fastbot(2.0) is a model-based testing tool for modeling GUI transitions to discover app stability problems
Objective-C
553
star
42

flow-builder

A highly customizable streaming flow builder.
TypeScript
526
star
43

MVDream-threestudio

3D generation code for MVDream
Python
473
star
44

effective_transformer

Running BERT without Padding
C++
452
star
45

ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
C++
449
star
46

Next-ViT

Python
426
star
47

matxscript

A high-performance, extensible Python AOT compiler.
C++
404
star
48

byteir

A model compilation solution for various hardware
MLIR
355
star
49

syllepsis

Syllepsis is an out-of-the-box rich text editor.
TypeScript
348
star
50

OMGD

Online Multi-Granularity Distillation for GAN Compression (ICCV2021)
Python
323
star
51

uss

Python
309
star
52

neurst

Neural end-to-end Speech Translation Toolkit
Python
299
star
53

danmu.js

HTML5 danmu (danmaku) plugin for any DOM element
JavaScript
290
star
54

vArmor

vArmor is a cloud native container sandbox system based on AppArmor/BPF/Seccomp. It also includes multiple built-in protection rules that are ready to use out of the box.
Go
263
star
55

particle-sfm

ParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild. ECCV 2022.
C++
263
star
56

lynx-llm

paper: https://arxiv.org/abs/2307.02469 page: https://lynx-llm.github.io/
Python
227
star
57

g3

Enterprise-oriented Generic Proxy Solutions
Rust
227
star
58

xgplayer-vue

Vue component for xgplayer, a HTML5 video player with a parser that saves traffic
JavaScript
219
star
59

trace-irqoff

Interrupts-off or softirqs-off latency tracer
C
195
star
60

ParaGen

ParaGen is a PyTorch deep learning framework for parallel sequence generation.
Python
186
star
61

DEADiff

[CVPR 2024] Official implementation of "DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations"
Python
184
star
62

ByteMLPerf

AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.
Python
181
star
63

MoMA

MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation
Jupyter Notebook
177
star
64

AWERTL

An non-invasive iOS framework for quickly adapting Right-To-Left style UI
Objective-C
175
star
65

Bytedance-UnionAD

Ruby
170
star
66

keyhouse

Keyhouse is a skeleton of general-purpose Key Management System written in Rust.
Rust
164
star
67

react-model

The next generation state management library for React
TypeScript
162
star
68

LargeBatchCTR

Large batch training of CTR models based on DeepCTR with CowClip.
Python
162
star
69

flux

A fast communication-overlapping library for tensor parallelism on GPUs.
C++
160
star
70

ic_flow_platform

IFP (ic flow platform) is an integrated circuit design flow platform, mainly used for IC process specification management and data flow contral.
Python
154
star
71

primus

Java
148
star
72

diat

A CLI tool to help with diagnosing Node.js processes basing on inspector.
JavaScript
145
star
73

DanmakuRenderEngine

DanmakuRenderEngine is a lightweight and scalable Android danmaku library. 轻量级高扩展安卓弹幕渲染引擎
Kotlin
140
star
74

Hammer

An efficient toolkit for training deep models.
Python
138
star
75

coconut_cvpr2024

Jupyter Notebook
127
star
76

ns-x

An easy-to-use, flexible network simulator library in Go.
Go
116
star
77

pv3d

Python
113
star
78

fc-clip

This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Python
109
star
79

RLFN

Winner of runtime track in NTIRE 2022 challenge on Efficient Super-Resolution
Python
106
star
80

DCFrame

DCFrame is a Swift UI collection framework, which can easily create complex UI.
Swift
100
star
81

trace-noschedule

Trace noschedule thread
C
99
star
82

tar-wasm

A faster experimental wasm-based tar implementation for browsers.
Rust
95
star
83

TWIST

Official codes: Self-Supervised Learning by Estimating Twin Class Distribution
Python
95
star
84

decoupleQ

A quantization algorithm for LLM
Cuda
94
star
85

magic-portal

⚡ A blazing fast micro-component and micro-frontend solution uses web-components under the hood.
TypeScript
91
star
86

xgplayer-react

React component for xgplayer, a HTML5 video player with a parser that saves traffic
JavaScript
84
star
87

fe-foundation

UI Foundation for React Hooks and Vue Composition Api
TypeScript
80
star
88

nnproxy

Scalable NameNode RPC Proxy for HDFS Federation
Java
79
star
89

dbatman

Go
74
star
90

Elkeid-HUB

Elkeid HUB is a rule/event processing engine maintained by the Elkeid Team that supports streaming/offline (not yet supported by the community edition) data processing. The original intention is to solve complex data/event processing and external system linkage requirements through standardized rules.
Python
74
star
91

FreeSeg

Python
69
star
92

pull_to_refresh

Flutter pull_to_refresh widget
Dart
67
star
93

Jeddak-DPSQL

DPSQL (Privacy Protection SQL Query Service) - This project is a microservice Middleware located between the database engine ( Hive , Clickhouse , etc.) and the application system. It provides transparent SQL query result desensitization capabilities.
Python
62
star
94

trace-runqlat

C
61
star
95

terark-zip

A data structure and algorithm library built for TerarkDB
C++
61
star
96

X-Portrait

Source code for the SIGGRAPH 2024 paper "X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention"
Python
59
star
97

kernel

ByteDance kernel for use on cloud.
C
57
star
98

scroll_kit

Dart
56
star
99

kvm-utils

C
55
star
100

dddfirework

Go
51
star