• This repository has been archived on 28/May/2019
  • Stars
    star
    326
  • Rank 129,027 (Top 3 %)
  • Language
    C++
  • License
    BSD 3-Clause "New...
  • Created almost 10 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Galaxy is a cluster management system.

Galaxy

Build Status
Copyright 2015, Baidu, Inc.
Galaxy是一个数据中心操作系统,目标是最大化资源的利用率与降低应用部署运维代价。

#Galaxy 3.0设计

背景

Galaxy3.0是对Galaxy2.0的重构,主要解决以下问题:

  1. 容器管理和服务管理紧耦合:服务的升级和启停都伴随容器的销毁和调度;
  2. 没有磁盘管理,只能管理home盘;
  3. 不支持用户quota和记账;
  4. 机器管理功能缺失;
  5. Naming功能可用性低;
  6. Trace功能不完善;

系统架构

    Galaxy3.0架构上分为2层: 资源管理层和服务管理层,每层都是主从式架构  
    1. 资源管理层由ResMan(Resource Manager)和Agent构成  
    2. 服务管理层由AppMaster和AppWorker构成;


   +-------------------+-----------------------------+
   |                   |               |             |
   |                   |   MapReduce   |   Spark     |
   |                   |               |             |
   |                   +-----------------------------+
   |                                                 |
   |               Service Management                |  ---> {AppMaster + AppWorkers}
   |                                                 |
   +-------------------------------------------------+
   |                                                 |
   |               Resource Management               |  ---> {ResMan + Agents}
   |                                                 |
   +-------------------------------------------------+

1. 资源管理层(Resource Management)

组件: ResMan + Agents
一个Galaxy集群只有一个处于工作状态的ResMan,负责容器的调度,为每个容器找到满足部署资源要求的机器;
ResMan通过和部署在各个机器上的Agent通信,来创建和销毁容器;
容器: 一个基于linux cgroup和namspace技术的资源隔离环境;
容器里默认会启动AppWorker进程,是容器内的第一个进程,也就是根进程;
ResMan不暴露给普通用户接口, 仅供内部组件以及集群管理员使用;

2. 服务管理层 (Service Management)

组件: AppMaster + AppWorkers
AppMaster是外界用户操作Galaxy的唯一入口;
一个Galaxy集群通常只有一个AppMaster,负责服务的部署、更新、启停和状态管理,把服务实例分发到各个机器上的容器内启动并跟踪状态;
AppMaster通过调用ResMan的RPC接口创建容器,容器内自动拉起AppWorker进程;
容器内的AppWorker进程通过和AppMaster进程通信,获得需要在容器内执行的命令,包括部署、启停、更新等等;
AppWorker会汇报服务的状态给AppMaster,例如托管的服务是否在正常运行,进程退出码等;

调度逻辑

用户提交的Job内容主要是两部分:资源需求 + 程序描述
资源需求: CPU核数、内存大小、磁盘容量、机器Lable、端口范围、mount路径
程序描述: 部署命令、启动命令、停止命令、更新命令、版本号

1. ResMan的调度逻辑

ResMan通过定时查询Agent,获得每个Agent上面可分配的资源
ResMan不断检查当前是否有处于Pending状态的容器, 寻找有资源的Agent创建容器;
创建失败的容器,又进入Pending状态,等待重新调度;
不符合预期的容器, ResMan命令Agent销毁, 重新进入Pending状态;
ResMan确保容器的个数始终符合用户的需求;

2. AppMaster的调度逻辑

AppMaster等待AppWorkers的定时汇报;
如果AppWorker汇报的服务状态不符合AppMaster的预期,则AppMaster返回一些命令让AppWorker执行;

a) 部署: AppWorker汇报目前没有运行任何服务, AppMaster返回部署命令给AppWorker;
b) 启动: AppWorker汇报部署成功了, AppMaster返回启动命令给AppWorker;
c) 更新: AppWorker汇报当前服务的版本号, AppMaster发现不匹配, 返回更新命令给AppWorker;
d) 失败处理: AppWorker汇报(部署失败 or 启动失败 or 更新失败), AppMaster记录此次异常,并根据策略决定是否让AppWorker继续重试;

容错

  1. ResMan,AppMaster都有备份,通过Nexus抢锁来Standby;
  2. Agent跟踪每个容器的状态汇报给ResMan,当容器个数不够或者不符合ResMan的要求时,就需要调度:创建或删除容器;
  3. AppWorker负责跟踪用户程序的状态,当用户程序coredump、异常退出或者被cgroup kill后,反馈状态给AppMaster,AppMaster根据指定策略命令AppWorker是否再次拉起用户的服务;
  4. 由于机器缺陷或者网络分割,可能导致ResMan认为容器个数足够,但是AppMaster发现服务实例数不够的情况:

例如: 磁盘坏了、端口被占用等, 导致用户服务始终无法拉起;
这种情况下, AppMaster可以调用ResMan的接口,增大容器个数(有上限);

服务发现

  1. SDK通过Nexus发现AppMaster地址;
  2. SDK请求AppMaster,发现每个Job实例的地址和当前的服务状态;
  3. AppMaster会定时同步服务地址和状态到第三方Naming系统(如BNS,Nexus,ZK等);

服务更新

  1. SDK通过Nexus发现指定的Job的AppMaster地址;
  2. SDK请求AppMaster, AppMaster将服务更新命令传播给AppWorker, AppWorker将更新状态反馈给AppMaster;
  3. AppWorker和AppMaster的通信方式是Pull的方式,因此AppMaster可以根据当前的情况来决定部署的暂停和步长控制;
  4. 服务的更新都在容器内进行,不涉及到容器的销毁和创建

权限管理和quota管理模型

  1. 集群(Cluster):共用同一ResMan的host/agent及服务
  2. 机器池(Pool): 一个host/agent只能属于一个机器池,一个机器池通常有很多host/agent。一个集群中可能有多个机器池。机器池用于资源及环境的硬隔离, 也是权限分配的单位。
  3. 用户(User):galaxy用户
  4. 权限(Authority):某用户在某机器池上具有的某种操作权限,如对Job的增、删、改、查权限等。用户可以同时对多个机器池具有多项权限。
  5. 配额(Quota):配额是对用户在集群中拥有资源量的描述, 包含cpu配额,内存配额, 磁盘空间配额,可提交任务数量配额等。用户的配额和具体的机器池没有关系。
  6. 标签(label): 标签一般用来表征一批拥有某种特征的机器,标签和机器是多对多的关系。有权限的用户可以对机器池中的机器打标签,提交任务时可指定标签。

系统依赖

  1. Nexus作为寻址和元信息保存
  2. MDT作为用户日志的Trace系统
  3. Sofa-PbRPC作为通信基础库

More Repositories

1

amis

前端低代码框架,通过 JSON 配置就能生成各种页面。
TypeScript
17,235
star
2

uid-generator

UniqueID generator
Java
5,429
star
3

san

A fast, portable, flexible JavaScript component framework
JavaScript
4,708
star
4

lac

百度NLP:分词,词性标注,命名实体识别,词重要性
C++
3,864
star
5

braft

An industrial-grade C++ implementation of RAFT consensus algorithm based on brpc, widely used inside Baidu to build highly-available distributed systems.
C++
3,499
star
6

dperf

dperf is a DPDK based 100Gbps network performance and load testing software.
C
3,273
star
7

bfs

The Baidu File System.
C++
2,853
star
8

openrasp

🔥Open source RASP solution
C++
2,774
star
9

Familia

A Toolkit for Industrial Topic Modeling
C++
2,638
star
10

AnyQ

FAQ-based Question Answering System
C++
2,584
star
11

sofa-pbrpc

A light-weight RPC implement of google protobuf RPC framework.
C++
2,130
star
12

Senta

Baidu's open-source Sentiment Analysis System.
Python
1,889
star
13

tera

An Internet-Scale Database.
C++
1,887
star
14

bfe-book

In-depth Understanding of BFE《深入理解BFE》(Book for BFE, a CNCF open source project. both in English and in Chinese)
1,212
star
15

BaikalDB

BaikalDB, A Distributed HTAP Database.
C++
1,169
star
16

bigflow

Baidu Bigflow is an interface that allows for writing distributed computing programs and provides lots of simple, flexible, powerful APIs. Using Bigflow, you can easily handle data of any scale. Bigflow processes 4P+ data inside Baidu and runs about 10k jobs every day.
C++
1,142
star
17

DuReader

Baseline Systems of DuReader Dataset
Python
1,133
star
18

DDParser

百度开源的依存句法分析系统
Python
973
star
19

starlight

Java implementation for Baidu RPC, multi-protocol & high performance RPC.
Java
961
star
20

CUP

CUP, common useful python-lib. (Currently, Most popular python lib in baidu). Python 开发底层库, 涵盖util、service(threadpool/generator/executor/cache等等)、logging、monitoring、增强型配置 等等库支持
Python
938
star
21

ICE-BA

C++
700
star
22

NoahV

An efficient front-end application framework based on vue.js
JavaScript
639
star
23

EasyFaaS

EasyFaaS是一个依赖轻、适配性强、资源占用少、无状态且高性能的函数计算服务引擎
Go
620
star
24

Curve

An Integrated Experimental Platform for time series data anomaly detection.
JavaScript
530
star
25

Jprotobuf-rpc-socket

Protobuf RPC是一种基于TCP协议的二进制RPC通信协议的Java实现
Java
516
star
26

bifromq

A MQTT broker implementation adopting serverless architecture
Java
514
star
27

fast_rgf

Multi-core implementation of Regularized Greedy Forest
C++
466
star
28

babylon

High-Performance C++ Fundamental Library
C++
457
star
29

Dialogue

Python
444
star
30

Elasticsearch

Baidu Elasticsearch
Java
432
star
31

brcc

BRCC(better remote config center)是一个分布式配置中心,用于统一管理应用服务的配置信息,避免各类资源散落在各个项目中,简化资源配置的维护成本。作为一种轻量级的解决方案,部署简单,同时支持多环境、多版本、多角色的资源管理,可以在不改变应用源码的情况下无缝切换和实时生效配置信息。
Java
390
star
32

Cafe

A powerful test framework for Android
Java
370
star
33

mix-img

A fast mix image javascript tool libary
JavaScript
332
star
34

puck

Puck is a high-performance ANN search engine
Jupyter Notebook
331
star
35

unit-dmkit

C++
327
star
36

information-extraction

Python
325
star
37

knowledge-driven-dialogue

baseline system of knowledge driven dialogue competition
Python
270
star
38

CarbonGraph

A Swift dependency injection / lookup framework for iOS
Swift
254
star
39

unit-uskit

unit-uskit
C++
251
star
40

BIPlatform

JavaScript
219
star
41

dlock

An effective and reliable Distributed Lock
Java
216
star
42

ins

iNexus, coordinate large scale services
C++
214
star
43

boteye

C++
212
star
44

titan-dex

Java
201
star
45

m-git

MGit 是一款基于 Git 的多仓库管理工具,可以安全的、高效的管理多个 Git 仓库; 适合于在多个仓库中进行关联开发的项目,实现批量的版本管理功能,提高 Git 操作的效率,避免逐个执行 Git 命令带来的误操作风险。
Ruby
166
star
46

Rubik

An Android platform component management tool chain, based on Kotlin language.
Kotlin
154
star
47

common

Common library
C++
132
star
48

go-lib

Go
126
star
49

titan-hotfix

Java
125
star
50

wx2

小程序互转工具
JavaScript
124
star
51

iot-sdk-c

device sdk for baidu IoT Core service, in c. Including MQTT client
C
118
star
52

Youtube-8M

PaddlePaddle models for Youtube-8M Video Understanding Challenge
Python
114
star
53

ar-sdk

DuMix AR SDK for Developer
GLSL
107
star
54

broc

Python
101
star
55

ITEST

Web service interface test framework
97
star
56

ote-stack

OTE-Stack is an edge computing platform for 5G and AI
Go
96
star
57

GPT

Java
87
star
58

redis

Baidu Ksarch Redis - a production solution of redis cluster
87
star
59

san-devtools

Browser developer tools extension for debugging San.
TypeScript
82
star
60

terminator

Service Virtualization
Java
76
star
61

QCompute

QCompute is a Python-based quantum software development kit (SDK). It provides a full-stack programming experience for advanced users via hybrid quantum programming language features and a high-performance simulator.
Python
76
star
62

spring-cloud-baidu

70
star
63

shuttle

A fast computing framework based on Galaxy
C++
64
star
64

iot-edge-sdk-for-iot-parser

C
64
star
65

baidu-iot-samples

C
61
star
66

san-store

Application States Management for San
JavaScript
59
star
67

ARK

Development framework of intelligent operation
Python
57
star
68

san-update

Object immutable update utility for san solution
JavaScript
56
star
69

logcover

轻量级异常日志测试覆盖率度量工具
Python
56
star
70

palo

A fast MPP database for all modern analytics on big data. Powered by Apache Doris(Incubating)
50
star
71

speech-samples

百度语音示例
Java
48
star
72

ntripcaster

C
43
star
73

san-router

Official Router for San
JavaScript
38
star
74

Quanlse

Jupyter Notebook
38
star
75

san-ssr

San SSR framework and utils
TypeScript
37
star
76

dm-kit-php

PHP
36
star
77

boteye_sensor

C
35
star
78

ipipe-agent

Java
33
star
79

OASP

OASP (Online App Status Protocol)
Java
32
star
80

san-composition

JavaScript
30
star
81

duedge-recipes

DuEdge百度边缘网络计算样例代码
JavaScript
27
star
82

paddle-on-k8s-operator

Kubernetes operator for managing the lifecycle of PaddlePaddle job.
Go
24
star
83

baiducloud-sdk-go

Go SDK for Baidu Cloud
Go
24
star
84

san-website

JavaScript
21
star
85

baiduads-sdk

Baidu Ads API SDK
Python
19
star
86

du1906_esp

DUHOME AIOT platform based on du1906 and esp32
C
18
star
87

highflip

HIGHFLIP: An easy way to bridge different federal learning platforms
18
star
88

smartapp-openapi-java

百度智能小程序服务端 OpenAPI SDK for java,是基于小程序服务端 OpenAPI 封装的一套让开发者方便使用的 SDK, 它可以帮开发者减少理解和使用 OpenAPI 的成本, 减少开发者直接调用服务端接口不当而引起的错误, 避免在开发中走弯路。
Java
16
star
89

san-factory

JavaScript
15
star
90

ttm

C
14
star
91

cluster-api-provider-baiducloud

Kubernetes cluster-api for Baidu Cloud
Go
13
star
92

minions

Baidu 100G Chasiss Switch hardware spec
11
star
93

signet

签章系统
JavaScript
10
star
94

sgxray

SGXRay: a bounded verifier for Intel SGX enclaves
C
10
star
95

grafana-tsdb-datasource

JavaScript
9
star
96

iotcore-sdk-java

Java SDK for baidu IoT Core service
Java
9
star
97

bce-fpga-dev-kit

VHDL
8
star
98

iot

for all code about Internet of Things
8
star
99

smartapp-openapi-go

百度智能小程序服务端 OpenAPI SDK for go,是基于小程序服务端 OpenAPI 封装的一套让开发者方便使用的 SDK, 它可以帮开发者减少理解和使用 OpenAPI 的成本, 减少开发者直接调用服务端接口不当而引起的错误, 避免在开发中走弯路。
Go
8
star
100

duedge-cli

DuEdge Command Line
Python
6
star