• Stars
    star
    397
  • Rank 104,547 (Top 3 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created about 10 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Network service to provide globally strictly monotone increasing timestamp

Chronos

简介

Chronos,在古希腊语意为时间,是实现高可用高性能提供全局唯一而且严格单调递增timestamp的服务。

Chronos采用主备架构,主服务器挂了以后备服务器迅速感知并接替服务,从而实现系统的高可用。服务端使用Thrift框架,经测试每秒可处理约60万次RPC请求,客户端单线程每秒可请求6万次(本地服务器),保证高性能与低延时。全局只有唯一的ChronosServer提供服务,分配的timestamp保证严格单调递增,并且将已分配的值持久化到ZooKeeper上,即使发生failover也能保证服务的正确性。

原理

chronos architecture

Chronos依赖ZooKeeper实现与HBase类似的Leader Election机制,ChronosServer启动时将自己的信息写到ZooKeeper的Master临时节点上,如果主服务器已经存在,那么就记录到BackupServers节点上。一旦Master临时节点消失(主服务器发生failover),所有备服务器收到ZooKeeper通知后参与新一轮的选主,保证最终只有一个新的主服务器接替服务。

ChronosServer运行时会启动一个Thrift服务器,提供getTimestamp()和getTimestamps(int)接口,并且保证每次返回的timestamp都是严格单调递增的。返回的timestamp与现实时间有基本对应关系,为当前Unix time乘以2的18次方(足够使用1115年),由于我们优化了性能,所以如果存在failover就不能保证这种对应关系的可靠性

ChronosClient启动时,通过访问ZooKeeper获得当前的主ChronosServer地址,连接该服务器后就可以发送Thrift RPC请求了。一旦主服务器发生failover,客户端请求失败,它会自动到ZooKeeper获得新的主ChronosServer地址重新建立连接。

使用

Chronos服务端

  1. 进入chronos-server目录,通过mvn clean package -DskipTests编译源码。
  2. 进入target里面的conf目录,编辑chronos.conf,填写依赖的ZooKeeper配置。
  3. 进入target里面的bin目录,执行sh ./chronos.sh既可运行ChronosServer。

Chronos客户端

  1. 进入chronos-client目录,通过mvn clean package -DskipTests编译源码。

  2. 客户端在pom.xml添加chronos-client的依赖(请使用对应的Thrift版本)。

    <dependency>
      <groupId>com.xiaomi.infra</groupId>
      <artifactId>chronos-client</artifactId>
      <version>1.2.0-thrift0.5.0</version>
    </dependency>
    
  3. 创建ChronosClient对象,如new ChronosClient("127.0.0.1:2181", "default-cluster")

  4. 发送RPC请求,如chronosClient.getTimestamp()chronosClient.getTimestamps(10)

快速体验

  1. 参考ZooKeeper文档,编译ZooKeeper并运行在127.0.0.1:2181上。
  2. 获得chronos源代码,执行mvn clean package -DskipTests编译(需要安装Thrift)。
  3. 进入chronos-server的bin目录,执行sh ./chronos.sh运行ChronosServer。
  4. 进入chronos-client目录,执行mvn exec:java -Dexec.mainClass="com.xiaomi.infra.chronos.client.ChronosClient" -Dexec.args="127.0.0.1:2181 default-cluster"

场景

  • 提供全局严格单调递增的timestamp,用于实现Percolator等全局性事务。
  • 提供全局唯一的值,相比snowflake不依赖NTP服务,并且提供failover机制。

工具

  • 提供list_servers.rb脚本,可监控当前的所有运行的ChronosServer状态。
  • 提供translate_timestamp.rb脚本,可将timestamp转化为可读的世界时间。
  • 提供process_benchmark_log.rb脚本,可处理Benchmark程序产生的日志。

测试

  • 性能测试
客户端线程数 平均QPS 平均Latency(毫秒) 平均Failover时间(秒) 服务端总QPS
1 10792.757 0.093 3.056 32378.271
10 7919.679 0.127 3.053 237590.370
20 6676.801 0.164 3.952 400788.060
50 3954.026 0.255 4.044 593103.900
100 1791.251 0.605 5.470 537375.300
50(最优) 3993.749 0.251 0.000 599062.350
  • Failover测试

持续杀掉ChronosServer和ZooKeeper进程没有发现正确性问题,failover时间符合预期。

注意,Failover时间可通过ZooKeeper的tickTime和Chronos的sessionTimeout来设置线上部署时应配合Supervisor或者God来监控和拉起服务


Chronos

Introduction

Chronos, known as "time" in Ancient Geek, is the high-availability, high-performance service to provide globally strictly monotone increasing timestamp.

Chronos uses standby architecture so that when the master server is down, any backup server could notice and take over the service. We use Thrift as RPC server, which could handle 600,000 QPS for server and 60,000 for client(with local server). All the timestamps are allocated by one master server so we can make sure they're unique and monotone increasing. It relies on ZooKeeper to store persistent data. Even if the master sever fails, backup servers could keep on allocating increasing timestamp as well.

How It Works

chronos architecture

Chronos implements the leader election with ZooKeeper, which is like HBase. ChronosServer tries to register itself in the ephemeral master znode when it starts up. But if the master server already existes, it will register in the backup servers znode. Once the master znode disappears(the master server fails), ZooKeeper will notify any backup server for leader election and finally choose one new master.

ChronosServer starts a Thrift server when it's running. We have implemented two interfaces, getTimestamp() and getTimestamps(int), and make sure that all the timestamps are strictly monotone increasing. Furthermore, this timestamp is based on the world time. But we don't guarantee that they're always the same if the master fails frequently.

The ChronosClient will get the address of master ChronosServer in ZooKeeper. Then it's able to send RPC requests to get timestamp. Once the master server fails, the requests fail as well, then ChronosClient will find out the new master server through ZooKeeper and recover.

Usage

Chronos-server

  1. Enter the directory of chronos-server, compile through mvn clean package -DskipTests.
  2. Enter the directory of conf, edit chronos.conf and fill in the ZooKeeper you're using.
  3. Enter the directory of bin, execute sh ./chronos.sh to monitor the running status.

Chronos-client

  1. Enter the directory of chronos-client, compile through mvn clean package -DskipTests.

  2. Add the dependency in pom.xml(with the same version of Thrift).

    <dependency>
      <groupId>com.xiaomi.infra</groupId>
      <artifactId>chronos-client</artifactId>
      <version>1.2.0-thrift0.5.0</version>
    </dependency>
    
  3. Construct the chronos client object, like new ChronosClient("127.0.0.1:2181", "default-cluster").

  4. Send RPC request through client, like chronosClient.getTimestamp() or chronosClient.getTimestamps(10).

Quick Use

  1. Refer ZooKeeper tutorial, compile ZooKeeper and run on 127.0.0.1:2181.
  2. Get source code of chronos, compile through mvn clean package -DskipTests(Thrift required).
  3. Enter the directory of chronos-server, execute sh ./chronos.sh to run ChronosServer.
  4. Enter the directory of chronos-client, run mvn exec:java -Dexec.mainClass="com.xiaomi.infra.chronos.client.ChronosClient" -Dexec.args="127.0.0.1:2181 default-cluster".

Scenario

  • Need globally strictly monotone increasing timestamp to implement global transation, like Percolator.
  • Need globally unique values. Unlike snowflake which relies on NTP, chronos has more restricted constraint and handles failover by nature.

Tools

  • List_servers.rb is used to display the status of all running ChronosServers.
  • Translate_timestamp.rb is used to translate the timestamp into world time.
  • Process_benchmark_log.rb is used to process the log of benchmark program.

Testing

  • Performance Test
Client Thread Average QPS Average Latency(ms) Average Failover Time(s) Server QPS
1 10792.757 0.093 3.056 32378.271
10 7919.679 0.127 3.053 237590.370
20 6676.801 0.164 3.952 400788.060
50 3954.026 0.255 4.044 593103.900
100 1791.251 0.605 5.470 537375.300
50(Optimum) 3993.749 0.251 0.000 599062.350
  • Failover Test

Continuously kill ChronosServer and ZooKeeper instance for one week, no correctness issue found and the failover time is in line with expectations.

Notice: Failover time can be configured by tickTime of ZooKeeper and sessionTimeout of Chronos. Online service should use Supervisor or God to pull up the ChronosServers after failover.

More Repositories

1

soar

SQL Optimizer And Rewriter
Go
8,595
star
2

mace

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
C++
4,871
star
3

open-falcon

A Distributed and High-Performance Monitoring System
3,031
star
4

Gaea

Gaea is a mysql proxy, it's developed by xiaomi b2c-dev team.
Go
2,552
star
5

naftis

An awesome dashboard for Istio built with love.
Go
1,898
star
6

mone

No description, website, or topics provided
Java
1,088
star
7

MiNLP

XiaoMi Natural Language Processing Toolkits
Scala
771
star
8

hiui

HIUI is a solution that is adequate for the fomulation and implementation of interaction and UI design standard for front, middle and backend.
TypeScript
713
star
9

android_tv_metro

android tv metro framework and server API
Java
652
star
10

minos

Minos is beyond a hadoop deployment system.
Python
520
star
11

rose

Rose is not only a framework.
Java
498
star
12

shepher

Java
492
star
13

MiLM-6B

414
star
14

LuckyMoneyTool

Java
375
star
15

mace-models

Mobile AI Compute Engine Model Zoo
Python
368
star
16

mobile-ai-bench

Benchmarking Neural Network Inference on Mobile Devices
C++
346
star
17

kaldi-onnx

Kaldi model converter to ONNX
Python
233
star
18

linden

Java
230
star
19

themis

Themis provides cross-row/cross-table transaction on HBase based on google's percolator.
Java
226
star
20

rdsn

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/rdsn
C++
144
star
21

thain

Thain is a distributed flow schedule platform.
TypeScript
81
star
22

misound

MiSound is a Android application making XiaoMi's SoundBar more powerful. EQ, control, player all in one.
Java
63
star
23

galaxy-sdk-java

Java SDK for Xiaomi Structured Datastore Service
Java
63
star
24

C3KG

Python
61
star
25

ozhera

Java
54
star
26

jack

Jack is a cluster manager built on top of Zookeeper and thrift.
51
star
27

galaxy-fds-sdk-python

Python SDK for Xiaomi File Data Storage.
Python
50
star
28

nnlib

Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib
C
49
star
29

pegasus-rocksdb

Has been migrated to https://github.com/pegasus-kv/rocksdb
C++
34
star
30

cloud-ml-sdk

Python
32
star
31

pegasus-java-client

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/java-client
Java
31
star
32

ECFileCache

Java
30
star
33

cmath

CMATH: Can your language model pass Chinese elementary school math test?
Python
30
star
34

talos-sdk-golang

Go SDK for Xiaomi Streaming Message Queue
Go
28
star
35

mace-kit

C++
27
star
36

pegasus-go-client

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/go-client
Go
23
star
37

emma

Python
22
star
38

xiaomi.github.com

JavaScript
21
star
39

galaxy-fds-sdk-java

Java SDK for Xiaomi File Data Storage.
Java
21
star
40

StableDiffusionOnDevice

本项目是一个通过文字生成图片的项目,基于开源模型Stable Diffusion V1.5生成可以在手机的CPU和NPU上运行的模型,包括其配套的模型运行框架。
C++
20
star
41

galaxy-fds-sdk-android

Android SDK for Xiaomi File Data Storage.
Java
17
star
42

galaxy-sdk-python

Python SDK for Xiaomi Structured Datastore Service
Python
16
star
43

go-fds

Next-generation fds golang sdk
Go
15
star
44

galaxy-fds-sdk-php

PHP SDK for Xiaomi File Data Storage.
PHP
15
star
45

galaxy-sdk-go

Go SDK for Xiaomi Structured Datastore Service
Go
15
star
46

galaxy-hadoop

Hadoop interface for Xiaomi Open Storage
Java
13
star
47

galaxy-thrift-api

Thrift API for Xiaomi Structured Datastore Service
Thrift
11
star
48

galaxy-fds-sdk-cpp

C++ SDK for Xiaomi File Data Storage
C++
10
star
49

galaxy-fds-sdk-javascript

JavaScript
9
star
50

pegasus-python-client

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/python-client
Python
8
star
51

galaxy-sdk-php

PHP SDK for Xiaomi Structured Datastore Service
PHP
8
star
52

pegasus-datax

Provide pegasus plugin in alibaba/DataX, please refer to 'pegasuswriter/doc/pegasuswriter.md'.
Java
8
star
53

galaxy-fds-migration-tool

A MapReduce tool to migrate objects or files parallely between different object storage systems
Java
7
star
54

galaxy-sdk-nodejs

Node.js SDK for Xiaomi Structured Datastore Service
JavaScript
6
star
55

pegasus-nodejs-client

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/nodejs-client
JavaScript
6
star
56

pegasus-scala-client

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/scala-client
Scala
6
star
57

PowerTestDemo

Java
5
star
58

DetermLR

Open source code for paper
Python
5
star
59

galaxy-fds-sdk-ios

ios sdk for galaxy-fds
Objective-C
5
star
60

SiMuST-C

Python
5
star
61

pegasus-YCSB

Provide pegasus plugin in YCSB, please refer to 'Test Pegasus' section in README.
Java
5
star
62

nlpcc-2023-shared-task-9

https://mp.weixin.qq.com/s/pBDvTmr_oOHUPzBhjXG-aw
Python
5
star
63

galaxy-sdk-cpp

C++ SDK for Xiaomi Structured Datastore Service
C++
4
star
64

TED-MMST

1
star
65

PowerTestDemoGlobal

The demo script of Power Consumption Test.
Java
1
star
66

galaxy-sdk-javascript

Javascript SDK for Xiaomi Structured Datastore Service
JavaScript
1
star