Kronos
Kronos is a distributed service / library which can be used to provide synchronized time in a cluster.
It provides an API to query "kronos time", which would be nearly same on all kronos nodes in the cluster. It does not update system time.
What is kronos
Kronos is a fault tolerant service / library which can be used to have synchronized time in a cluster. The time provided by this service may be different from system time, but it will be nearly the same on all nodes of the cluster.
In a cluster each node has a local system clock which it uses for time. There can be clock skews between nodes. Individual nodes can also have clock jumps. Some services require a "global" timestamp for ordering of events, and having a global view of time is difficult in a cluster.
Kronos solves above problems and has the following properties:
- It always serves monotonic time.
- It is immune to large jumps in system time (Go v1.9 onwards).
- It has the same rate of flow as real time.
- It returns nearly the same time on all the nodes (within a few ms).
- It is fault tolerant and works in the presence of node failures
- It requires only a quorum of nodes to be up.
- It supports addition / removal of nodes from the cluster.
This library also provides an "uptime" which is a distributed monotonic timer starting from 0. This timer will not have backward jumps and not have forward jumps in most cases. In rare cases it can have small forward jumps (in the order of a few seconds). This "uptime" can be useful in measuring time durations when the kronos service is live.
Comparison with NTPD
The ntpd program is an operating system daemon which sets and maintains the system time of day in synchronism with Internet standard time server (http://doc.ntp.org/4.1.0/ntpd.htm)
Kronos and NTPD solve different problems. The objective of kronos is to provide access to monotonic time which is guaranteed to be nearly the same on all nodes in the cluster at all times (irrespective of changes to system time). The key differences are
- Kronos provides an API to access time. It does not modify system time like NTPD.
- Kronos time is monotonic. NTPD is not resilient to backward system time jumps.
- Kronos time will not have large forward jumps (Go v1.9 onwards) since it
uses CLOCK_REALTIME only for a reference point, and continues to use
CLOCK_MONOTONIC to measure progress of time.
NTPD does not prevent large system time jumps (forward / backward), but corrects those jumps after they happen. This correction can take a while. - If a kronos node finds itself partitioned away from the cluster, it will stop
serving time requests. Hence if kronos time is served by a node, it can be
assumed to be accurate.
The same cannot be said for system time. It is (almost) always accessible and there is no guarantee that it would be nearly the same as other nodes of the cluster. - The time provided by kronos may not be same as system time / UTC time of
any time server. It provides a guarantee of monotonicity and that it would
always be nearly the same on all nodes of the cluster. Hence it should be
used when the order of events is important, and not when the events
actually happened.
Hence this should be used in conjunction with NTPD which keeps system time in sync with an Internet standard time server (but can be prone to time jumps). - Kronos does not attempt to measure RTT variance. It is not optimized for
cross data-center deployments yet.
NTPD can synchronize time cross data-center.
Kronos is not a replacement for NTPD. It's recommended to use NTPD in conjunction with Kronos so that applications can choose which timestamp they want to use (system time vs kronos time).
Design
-
Time is always monotonic
This is easy to ensure for a running service by storing the last served timestamp in memory and flat-lining if a backward jump is detected. To ensure monotonic time across restart, an upper bound to time is periodically persisted (like current time +10s). This time cap is used to ensure monotonicity across restarts. -
It is immune to large jumps in system time
It has the same rate of flow as real time
Kronos uses CLOCK_MONOTONIC along with CLOCK_REALTIME. CLOCK_REALTIME represents the machine's best-guess as to the current wall-clock, time-of-day time.
CLOCK_MONOTONIC represents the absolute elapsed wall-clock time since some arbitrary, fixed point in the past. It isn't affected by changes in the system time-of-day clock.
CLOCK_MONOTONIC is used to compute progress of time from a fixed reference picked from CLOCK_REALTIME.
Time used in kronos is given by
local_time = (start_time + time.Since(start_time))
start_time
is a fixed point of reference taken from CLOCK_REALTIME.
time.Since(start_time)
is the monotonic time sincestart_time
. Time used this way is immune to large system jumps and has the same rate of flow as real time.
This is supported Go 1.9 onwards https://github.com/golang/proposal/blob/master/design/12914-monotonic.md -
Returns nearly the same time on all the nodes (within a few ms)
It is fault tolerant and works in the presence of node failures
It supports addition / removal of nodes from the cluster
Kronos elects a single "oracle" from within the cluster. This oracle is a node with which all nodes in the cluster periodically sync time. Periodic time synchronization suffices since each node is immune to large system time jumps. RPCs are used to synchronize time with the oracle. Kronos time on each node is given aslocal_time + delta
, where delta is the time difference between kronos time this node and kronos time of the oracle. The delta is updated during every synchronization with the oracle. The oracle is elected using raft (distributed consensus protocol). Raft provides a mechanism to manage a distributed state machine. Kronos stores the identity of the current oracle and an upper bound to time (for safety reasons) in this state machine.
Raft is failure tolerant and can continue to work as long as a quorum of nodes are healthy. It also supports addition and removal of nodes to the cluster.
Server operation
When a kronos server starts up, it first tries to initialize itself to find out
the current kronos time.
It does the following till it is initialized
- If there is no current oracle
- Propose self as oracle
- If self is the current oracle
- Wait to see if another initialized node becomes the oracle. Initialized
nodes have the correct kronos time and should get preference to become
the oracle
- If no nodes becomes the oracle
- If a backward jump is detected (since this node has no delta
information)
- Continue time from time cap
- Propose self as oracle
- If a backward jump is detected (since this node has no delta
information)
- If no nodes becomes the oracle
- Wait to see if another initialized node becomes the oracle. Initialized
nodes have the correct kronos time and should get preference to become
the oracle
- If self is not the oracle, sync with the current oracle
- If the the current oracle is down
- Propose self as oracle
- If the the current oracle is down
When proposing self as oracle, the server proposes itself as the new oracle and
a new time cap to the state machine managed by raft. Raft guarantees
serializability so CAS (compare and set) operations are used to manage this
state machine.
Once the server is initialized, it operates as follows
- If self is the current oracle
- Continue proposing self as oracle
- If self is not the oracle, sync with the current oracle
- If the the current oracle is down
- Propose self as oracle
- If the the current oracle is down
Syncing time with the oracle
Time synchronization with the oracle happens over RPCs. Kronos has a limit on
the RTT allowed for this RPC for it to be considered valid. The goal is to keep
the kronos time of the follower in sync with kronos time of the oracle.
Let's say a follower requests the oracle for its kronos time at ts
and receives a response at te (where ts is the kronos time
of the follower), and the oracle responds with to.
Time needs to be adjusted taking RTT into account. The true oracle time will lie
between to and to + RTT
If tf does not lie within this interval, delta is adjusted so that it falls into this interval (kronos time on follower is given as local_time + delta with oracle).
Cluster operations
Addition and removal of nodes entail addition and removal of nodes from the raft
cluster. A kronos node maintains metadata about itself and its view of the
cluster locally in a file.
Each kronos server is passed a list of kronos seed hosts which it can use to
initialize itself.
When a new kronos node is started, it assigns itself a NodeID and requests the
seed hosts to add this new node to the raft cluster. Once this request succeeds,
it joins the raft cluster and continues its usual operations.
Seed hosts are not necessary for subsequent restarts of a node since each node
has a local view of the cluster, but can be used in order to ramp up a node
which has been down for a long time.
Usage
Kronos can be used as a standalone service or embedded in another service as a library.
Kronos requires Go version 1.9+ for immunity against large clock jumps.
Install
- Install Go version >= 1.15
- Clone the repository
git clone https://github.com/rubrikinc/kronos.git && cd kronos
Build/Test
- Build kronos:
make build
creates a kronos binary. - Run unit tests:
make test
- Run acceptance tests:
make acceptance
kronos help
can be used to get information on subcommands.
Starting a kronos server
The below are example commands to create a kronos cluster
- Log and data directory have to be created
- The IPs used are 127.0.0.1. For multi node clusters, the IP of different nodes can be used.
Server 1
./kronos \
--log-dir ./log0\
start\
--advertise-host 127.0.0.1\
--raft-port 5766\
--grpc-port 5767\
--pprof-addr :5768\
--data-dir ./data0\
--seed-hosts 127.0.0.1:5766,127.0.0.1:6766
Server 2
./kronos \
--log-dir ./log1\
start\
--advertise-host 127.0.0.1\
--raft-port 6766\
--grpc-port 6767\
--pprof-addr :6768\
--data-dir ./data1\
--seed-hosts 127.0.0.1:5766,127.0.0.1:6766
Server 3
./kronos \
--log-dir ./log2\
start\
--advertise-host 127.0.0.1\
--raft-port 7766\
--grpc-port 7767\
--pprof-addr :7768\
--data-dir ./data2\
--seed-hosts 127.0.0.1:5766,127.0.0.1:6766
Kronos servers discover each other based on seed hosts (which is the raft address of some nodes of the cluster)
Time (In Unix Nanos) can be queried using kronos time
and status can be
queried using kronos status
. If the default ports (5766, 5767) are not being
used, they need to be passed as arguments to time / status commands. For example
./kronos time --grpc-addr=127.0.0.1:5767
1536315057173673605
./kronos status --raft-addr 127.0.0.1:5766 --all kronos โ โฑ โผ
Raft ID Raft Address GRPC Address Server Status Oracle Address Oracle Id Time Cap Delta Time
a7041e2396dec1e3 127.0.0.1:5766 127.0.0.1:5767 INITIALIZED 127.0.0.1:5767 599 1536315306176477839 0s 1536315246559974475
b2300f5e4ece5750 127.0.0.1:7766 127.0.0.1:7767 INITIALIZED 127.0.0.1:5767 599 1536315306176477839 0s 1536315246560100423
c56aec267a57d604 127.0.0.1:6766 127.0.0.1:6767 INITIALIZED 127.0.0.1:5767 599 1536315306176477839 0s 1536315246560047515
Removing a node from the cluster
A node can be removed using
./kronos cluster remove <Raft ID>
where Raft ID is the id in kronos status
Embedding kronos
A kronos server can be started in a go application by using
import(
"github.com/rubrikinc/kronos"
"github.com/rubrikinc/kronos/oracle"
"github.com/rubrikinc/kronos/pb"
kronosserver "github.com/rubrikinc/kronos/server"
)
if err := kronos.Initialize(ctx, kronosserver.Config{
Clock: tm.NewMonotonicClock(),
OracleTimeCapDelta: kronosserver.DefaultOracleTimeCapDelta,
ManageOracleTickInterval: 3 * time.Second,
RaftConfig: &oracle.RaftConfig{
CertsDir: certsDir,
DataDir: kronosDir,
GRPCHostPort: &kronospb.NodeAddr{
Host: advertiseHost,
Port: serverCfg.KronosGRPCPort,
},
RaftHostPort: &kronospb.NodeAddr{
Host: advertiseHost,
Port: serverCfg.KronosRaftPort,
},
SeedHosts: seedHosts,
},
}); err != nil {
return err
}
// call kronos.Stop() in shutdown
// call kronos.Now() for kronos time
Client example
An example of a go
client usage
import (
"context"
"fmt"
"github.com/rubrikinc/kronos/kronosutil/log"
"github.com/rubrikinc/kronos/pb"
"github.com/rubrikinc/kronos/server"
)
func main() {
ctx := context.Background()
c := server.NewGRPCClient("" /* certs dir */)
// Similar to below, KronosUptime endpoint can be used to get uptime.
timeResponse, err := c.KronosTime(ctx, &kronospb.NodeAddr{
Host: "127.0.0.1", /* IP of kronos server */
Port: "5767", /* GRPC port of kronos server */
})
if err != nil {
log.Fatal(ctx, err)
}
fmt.Println(timeResponse.Time)
}
Clients can also be created for other languages by generating GRPC clients
for TimeService in pb/kronos.proto
TLS Support
Kronos supports TLS. The certificates directory can be passed using --certs-dir
or set using the environment variable KRONOS_CERTS_DIR
.
The CA and node certs can be created using something similar to
https://www.cockroachlabs.com/docs/stable/create-security-certificates.html