• Stars
    star
    4,086
  • Rank 10,624 (Top 0.3 %)
  • Language
    Go
  • License
    MIT License
  • Created over 9 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Crypto-Secure Reliable-UDP Library for golang with FEC

kcp-go

GoDoc Powered MIT licensed Build Status Go Report Card Coverage Statusd Sourcegraph

Introduction

kcp-go is a Production-Grade Reliable-UDP library for golang.

This library intents to provide a smooth, resilient, ordered, error-checked and anonymous delivery of streams over UDP packets, it has been battle-tested with opensource project kcptun. Millions of devices(from low-end MIPS routers to high-end servers) have deployed kcp-go powered program in a variety of forms like online games, live broadcasting, file synchronization and network acceleration.

Lastest Release

Features

  1. Designed for Latency-sensitive scenarios.
  2. Cache friendly and Memory optimized design, offers extremely High Performance core.
  3. Handles >5K concurrent connections on a single commodity server.
  4. Compatible with net.Conn and net.Listener, a drop-in replacement for net.TCPConn.
  5. FEC(Forward Error Correction) Support with Reed-Solomon Codes
  6. Packet level encryption support with AES, TEA, 3DES, Blowfish, Cast5, Salsa20, etc. in CFB mode, which generates completely anonymous packet.
  7. Only A fixed number of goroutines will be created for the entire server application, costs in context switch between goroutines have been taken into consideration.
  8. Compatible with skywind3000's C version with various improvements.
  9. Platform-dependent optimizations: sendmmsg and recvmmsg were expoloited for linux.

Documentation

For complete documentation, see the associated Godoc.

Specification

Frame Format

NONCE:
  16bytes cryptographically secure random number, nonce changes for every packet.
  
CRC32:
  CRC-32 checksum of data using the IEEE polynomial
 
FEC TYPE:
  typeData = 0xF1
  typeParity = 0xF2
  
FEC SEQID:
  monotonically increasing in range: [0, (0xffffffff/shardSize) * shardSize - 1]
  
SIZE:
  The size of KCP frame plus 2
+-----------------+
| SESSION         |
+-----------------+
| KCP(ARQ)        |
+-----------------+
| FEC(OPTIONAL)   |
+-----------------+
| CRYPTO(OPTIONAL)|
+-----------------+
| UDP(PACKET)     |
+-----------------+
| IP              |
+-----------------+
| LINK            |
+-----------------+
| PHY             |
+-----------------+
(LAYER MODEL OF KCP-GO)

Examples

  1. simple examples
  2. kcptun client
  3. kcptun server

Benchmark

===
Model Name:	MacBook Pro
Model Identifier:	MacBookPro14,1
Processor Name:	Intel Core i5
Processor Speed:	3.1 GHz
Number of Processors:	1
Total Number of Cores:	2
L2 Cache (per Core):	256 KB
L3 Cache:	4 MB
Memory:	8 GB
===

$ go test -v -run=^$ -bench .
beginning tests, encryption:salsa20, fec:10/3
goos: darwin
goarch: amd64
pkg: github.com/xtaci/kcp-go
BenchmarkSM4-4                 	   50000	     32180 ns/op	  93.23 MB/s	       0 B/op	       0 allocs/op
BenchmarkAES128-4              	  500000	      3285 ns/op	 913.21 MB/s	       0 B/op	       0 allocs/op
BenchmarkAES192-4              	  300000	      3623 ns/op	 827.85 MB/s	       0 B/op	       0 allocs/op
BenchmarkAES256-4              	  300000	      3874 ns/op	 774.20 MB/s	       0 B/op	       0 allocs/op
BenchmarkTEA-4                 	  100000	     15384 ns/op	 195.00 MB/s	       0 B/op	       0 allocs/op
BenchmarkXOR-4                 	20000000	        89.9 ns/op	33372.00 MB/s	       0 B/op	       0 allocs/op
BenchmarkBlowfish-4            	   50000	     26927 ns/op	 111.41 MB/s	       0 B/op	       0 allocs/op
BenchmarkNone-4                	30000000	        45.7 ns/op	65597.94 MB/s	       0 B/op	       0 allocs/op
BenchmarkCast5-4               	   50000	     34258 ns/op	  87.57 MB/s	       0 B/op	       0 allocs/op
Benchmark3DES-4                	   10000	    117149 ns/op	  25.61 MB/s	       0 B/op	       0 allocs/op
BenchmarkTwofish-4             	   50000	     33538 ns/op	  89.45 MB/s	       0 B/op	       0 allocs/op
BenchmarkXTEA-4                	   30000	     45666 ns/op	  65.69 MB/s	       0 B/op	       0 allocs/op
BenchmarkSalsa20-4             	  500000	      3308 ns/op	 906.76 MB/s	       0 B/op	       0 allocs/op
BenchmarkCRC32-4               	20000000	        65.2 ns/op	15712.43 MB/s
BenchmarkCsprngSystem-4        	 1000000	      1150 ns/op	  13.91 MB/s
BenchmarkCsprngMD5-4           	10000000	       145 ns/op	 110.26 MB/s
BenchmarkCsprngSHA1-4          	10000000	       158 ns/op	 126.54 MB/s
BenchmarkCsprngNonceMD5-4      	10000000	       153 ns/op	 104.22 MB/s
BenchmarkCsprngNonceAES128-4   	100000000	        19.1 ns/op	 837.81 MB/s
BenchmarkFECDecode-4           	 1000000	      1119 ns/op	1339.61 MB/s	    1606 B/op	       2 allocs/op
BenchmarkFECEncode-4           	 2000000	       832 ns/op	1801.83 MB/s	      17 B/op	       0 allocs/op
BenchmarkFlush-4               	 5000000	       272 ns/op	       0 B/op	       0 allocs/op
BenchmarkEchoSpeed4K-4         	    5000	    259617 ns/op	  15.78 MB/s	    5451 B/op	     149 allocs/op
BenchmarkEchoSpeed64K-4        	    1000	   1706084 ns/op	  38.41 MB/s	   56002 B/op	    1604 allocs/op
BenchmarkEchoSpeed512K-4       	     100	  14345505 ns/op	  36.55 MB/s	  482597 B/op	   13045 allocs/op
BenchmarkEchoSpeed1M-4         	      30	  34859104 ns/op	  30.08 MB/s	 1143773 B/op	   27186 allocs/op
BenchmarkSinkSpeed4K-4         	   50000	     31369 ns/op	 130.57 MB/s	    1566 B/op	      30 allocs/op
BenchmarkSinkSpeed64K-4        	    5000	    329065 ns/op	 199.16 MB/s	   21529 B/op	     453 allocs/op
BenchmarkSinkSpeed256K-4       	     500	   2373354 ns/op	 220.91 MB/s	  166332 B/op	    3554 allocs/op
BenchmarkSinkSpeed1M-4         	     300	   5117927 ns/op	 204.88 MB/s	  310378 B/op	    6988 allocs/op
PASS
ok  	github.com/xtaci/kcp-go	50.349s
=== Raspberry Pi 4 ===

➜  kcp-go git:(master) cat /proc/cpuinfo
processor	: 0
model name	: ARMv7 Processor rev 3 (v7l)
BogoMIPS	: 108.00
Features	: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
CPU implementer	: 0x41
CPU architecture: 7
CPU variant	: 0x0
CPU part	: 0xd08
CPU revision	: 3

➜  kcp-go git:(master)  go test -run=^$ -bench .
2020/01/05 19:25:13 beginning tests, encryption:salsa20, fec:10/3
goos: linux
goarch: arm
pkg: github.com/xtaci/kcp-go/v5
BenchmarkSM4-4                     20000             86475 ns/op          34.69 MB/s           0 B/op          0 allocs/op
BenchmarkAES128-4                  20000             62254 ns/op          48.19 MB/s           0 B/op          0 allocs/op
BenchmarkAES192-4                  20000             71802 ns/op          41.78 MB/s           0 B/op          0 allocs/op
BenchmarkAES256-4                  20000             80570 ns/op          37.23 MB/s           0 B/op          0 allocs/op
BenchmarkTEA-4                     50000             37343 ns/op          80.34 MB/s           0 B/op          0 allocs/op
BenchmarkXOR-4                    100000             22266 ns/op         134.73 MB/s           0 B/op          0 allocs/op
BenchmarkBlowfish-4                20000             66123 ns/op          45.37 MB/s           0 B/op          0 allocs/op
BenchmarkNone-4                  3000000               518 ns/op        5786.77 MB/s           0 B/op          0 allocs/op
BenchmarkCast5-4                   20000             76705 ns/op          39.11 MB/s           0 B/op          0 allocs/op
Benchmark3DES-4                     5000            418868 ns/op           7.16 MB/s           0 B/op          0 allocs/op
BenchmarkTwofish-4                  5000            326896 ns/op           9.18 MB/s           0 B/op          0 allocs/op
BenchmarkXTEA-4                    10000            114418 ns/op          26.22 MB/s           0 B/op          0 allocs/op
BenchmarkSalsa20-4                 50000             36736 ns/op          81.66 MB/s           0 B/op          0 allocs/op
BenchmarkCRC32-4                 1000000              1735 ns/op         589.98 MB/s
BenchmarkCsprngSystem-4          1000000              2179 ns/op           7.34 MB/s
BenchmarkCsprngMD5-4             2000000               811 ns/op          19.71 MB/s
BenchmarkCsprngSHA1-4            2000000               862 ns/op          23.19 MB/s
BenchmarkCsprngNonceMD5-4        2000000               878 ns/op          18.22 MB/s
BenchmarkCsprngNonceAES128-4     5000000               326 ns/op          48.97 MB/s
BenchmarkFECDecode-4              200000              9081 ns/op         165.16 MB/s         140 B/op          1 allocs/op
BenchmarkFECEncode-4              100000             12039 ns/op         124.59 MB/s          11 B/op          0 allocs/op
BenchmarkFlush-4                  100000             21704 ns/op               0 B/op          0 allocs/op
BenchmarkEchoSpeed4K-4              2000            981182 ns/op           4.17 MB/s       12384 B/op        424 allocs/op
BenchmarkEchoSpeed64K-4              100          10503324 ns/op           6.24 MB/s      123616 B/op       3779 allocs/op
BenchmarkEchoSpeed512K-4              20         138633802 ns/op           3.78 MB/s     1606584 B/op      29233 allocs/op
BenchmarkEchoSpeed1M-4                 5         372903568 ns/op           2.81 MB/s     4080504 B/op      63600 allocs/op
BenchmarkSinkSpeed4K-4             10000            121239 ns/op          33.78 MB/s        4647 B/op        104 allocs/op
BenchmarkSinkSpeed64K-4             1000           1587906 ns/op          41.27 MB/s       50914 B/op       1115 allocs/op
BenchmarkSinkSpeed256K-4             100          16277830 ns/op          32.21 MB/s      453027 B/op       9296 allocs/op
BenchmarkSinkSpeed1M-4               100          31040703 ns/op          33.78 MB/s      898097 B/op      18932 allocs/op
PASS
ok      github.com/xtaci/kcp-go/v5      64.151s

Typical Flame Graph

Flame Graph in kcptun

Key Design Considerations

  1. slice vs. container/list

kcp.flush() loops through the send queue for retransmission checking for every 20ms(interval).

I've wrote a benchmark for comparing sequential loop through slice and container/list here:

https://github.com/xtaci/notes/blob/master/golang/benchmark2/cachemiss_test.go

BenchmarkLoopSlice-4   	2000000000	         0.39 ns/op
BenchmarkLoopList-4    	100000000	        54.6 ns/op

List structure introduces heavy cache misses compared to slice which owns better locality, 5000 connections with 32 window size and 20ms interval will cost 6us/0.03%(cpu) using slice, and 8.7ms/43.5%(cpu) for list for each kcp.flush().

  1. Timing accuracy vs. syscall clock_gettime

Timing is critical to RTT estimator, inaccurate timing leads to false retransmissions in KCP, but calling time.Now() costs 42 cycles(10.5ns on 4GHz CPU, 15.6ns on my MacBook Pro 2.7GHz).

The benchmark for time.Now() lies here:

https://github.com/xtaci/notes/blob/master/golang/benchmark2/syscall_test.go

BenchmarkNow-4         	100000000	        15.6 ns/op

In kcp-go, after each kcp.output() function call, current clock time will be updated upon return, and for a single kcp.flush() operation, current time will be queried from system once. For most of the time, 5000 connections costs 5000 * 15.6ns = 78us(a fixed cost while no packet needs to be sent), as for 10MB/s data transfering with 1400 MTU, kcp.output() will be called around 7500 times and costs 117us for time.Now() in every second.

  1. Memory management

Primary memory allocation are done from a global buffer pool xmit.Buf, in kcp-go, when we need to allocate some bytes, we can get from that pool, and a fixed-capacity 1500 bytes(mtuLimit) will be returned, the rx queue, tx queue and fec queue all receive bytes from there, and they will return the bytes to the pool after using to prevent unnecessary zer0ing of bytes. The pool mechanism maintained a high watermark for slice objects, these in-flight objects from the pool will survive from the perodical garbage collection, meanwhile the pool kept the ability to return the memory to runtime if in idle.

  1. Information security

kcp-go is shipped with builtin packet encryption powered by various block encryption algorithms and works in Cipher Feedback Mode, for each packet to be sent, the encryption process will start from encrypting a nonce from the system entropy, so encryption to same plaintexts never leads to a same ciphertexts thereafter.

The contents of the packets are completely anonymous with encryption, including the headers(FEC,KCP), checksums and contents. Note that, no matter which encryption method you choose on you upper layer, if you disable encryption, the transmit will be insecure somehow, since the header is PLAINTEXT to everyone it would be susceptible to header tampering, such as jamming the sliding window size, round-trip time, FEC property and checksums. AES-128 is suggested for minimal encryption since modern CPUs are shipped with AES-NI instructions and performs even better than salsa20(check the table above).

Other possible attacks to kcp-go includes: a) traffic analysis, dataflow on specific websites may have pattern while interchanging data, but this type of eavesdropping has been mitigated by adapting smux to mix data streams so as to introduce noises, perfect solution to this has not appeared yet, theroretically by shuffling/mixing messages on larger scale network may mitigate this problem. b) replay attack, since the asymmetrical encryption has not been introduced into kcp-go for some reason, capturing the packets and replay them on a different machine is possible, (notice: hijacking the session and decrypting the contents is still impossible), so upper layers should contain a asymmetrical encryption system to guarantee the authenticity of each message(to process message exactly once), such as HTTPS/OpenSSL/LibreSSL, only by signing the requests with private keys can eliminate this type of attack.

Connection Termination

Control messages like SYN/FIN/RST in TCP are not defined in KCP, you need some keepalive/heartbeat mechanism in the application-level. A real world example is to use some multiplexing protocol over session, such as smux(with embedded keepalive mechanism), see kcptun for example.

FAQ

Q: I'm handling >5K connections on my server, the CPU utilization is so high.

A: A standalone agent or gate server for running kcp-go is suggested, not only for CPU utilization, but also important to the precision of RTT measurements(timing) which indirectly affects retransmission. By increasing update interval with SetNoDelay like conn.SetNoDelay(1, 40, 1, 1) will dramatically reduce system load, but lower the performance.

Q: When should I enable FEC?

A: Forward error correction is critical to long-distance transmission, because a packet loss will lead to a huge penalty in time. And for the complicated packet routing network in modern world, round-trip time based loss check will not always be efficient, the big deviation of RTT samples in the long way usually leads to a larger RTO value in typical rtt estimator, which in other words, slows down the transmission.

Q: Should I enable encryption?

A: Yes, for the safety of protocol, even if the upper layer has encrypted.

Who is using this?

  1. https://github.com/xtaci/kcptun -- A Secure Tunnel Based On KCP over UDP.
  2. https://github.com/getlantern/lantern -- Lantern delivers fast access to the open Internet.
  3. https://github.com/smallnest/rpcx -- A RPC service framework based on net/rpc like alibaba Dubbo and weibo Motan.
  4. https://github.com/gonet2/agent -- A gateway for games with stream multiplexing.
  5. https://github.com/syncthing/syncthing -- Open Source Continuous File Synchronization.

Links

  1. https://github.com/xtaci/smux/ -- A Stream Multiplexing Library for golang with least memory
  2. https://github.com/xtaci/libkcp -- FEC enhanced KCP session library for iOS/Android in C++
  3. https://github.com/skywind3000/kcp -- A Fast and Reliable ARQ Protocol
  4. https://github.com/klauspost/reedsolomon -- Reed-Solomon Erasure Coding in Go

More Repositories

1

kcptun

A Quantum-Safe Secure Tunnel based on QPP, KCP, FEC, and N:M multiplexing.
Go
13,904
star
2

algorithms

Algorithms & Data structures in C++.
C++
5,261
star
3

smux

A Stream Multiplexing Library for golang with least memory usage(TDMA)
Go
1,320
star
4

gonet

A Game Server Skeleton in golang.
Go
1,264
star
5

gaio

High performance minimalism async-io(proactor) networking for Golang.
Go
808
star
6

libkcp

FEC enhanced KCP session library for iOS/Android in C++
C
308
star
7

tcpraw

Sending packets through TCP
Go
137
star
8

buddha

佛教衄料汇集
63
star
9

safebox

One key to derive all
Go
56
star
10

navmesh

navigation mesh in golang
Go
41
star
11

sp

Stream Processors on Kafka in Golang
Go
29
star
12

chat

pub/sub based chat server
Go
27
star
13

lossyconn

lossy connection simulator
Go
24
star
14

rank

ranking server
Go
23
star
15

sstable

bigdata processing in golang
Go
23
star
16

rewind

Text-Based UI for Kafka
Go
20
star
17

qpp

Quantum Permutation Pad (QPP)
Go
18
star
18

goeval

eval golang code on the fly
Go
15
star
19

log_analysis

Practical Log Analysis
15
star
20

fibernet

Message Queue/C++/Lua based game server
C
15
star
21

wsl-best-practice

best practice for development environment in WSL
14
star
22

notes

personal notes
Go
12
star
23

hppk

Homomorphic Polynomial Public Key
Go
11
star
24

gogw

Go
9
star
25

auth

auth service
Go
9
star
26

bgsave

background save process of redis
Go
7
star
27

goperf

golang performance benchmarks
Go
7
star
28

reorg

A simulated LFN network to mitigate network jitter, reorg trade latency in exchange for smoothness, so as to behave like a long fat but stable network.
Go
6
star
29

serialpacket

net.PacketConn over RS232/LoRa
Go
6
star
30

easenet

Automatically exported from code.google.com/p/easenet
C
4
star
31

xtaci

4
star
32

json2hive

generate hive schema from a json document
Go
4
star
33

chacha20

an exposed version of https://godoc.org/golang.org/x/crypto/internal/chacha20
Go
4
star
34

tmach

turing machine game
Go
4
star
35

dppk

A Deterministic Polynomial Public Key Algorithm over a Prime Galois Field GF(p)
Go
4
star
36

kidsmath

a simple program to generate math quizs for my kid.
Go
3
star
37

archiver

redolog archive and replay
Go
3
star
38

serial2tun

Serial To Tun Device
3
star
39

zturn

(zturn:ζŠ˜θ…ΎοΌ‰ a free game brings you back to 1980s
3
star
40

goscm

simple scheme interpreter
Go
1
star
41

poly2tri.as3

Automatically exported from code.google.com/p/poly2tri.as3
ActionScript
1
star
42

logrushooks

hooks for logrus
Go
1
star
43

ssh-kvr

lex/yacc learning
C
1
star
44

poly2tri

Automatically exported from code.google.com/p/poly2tri
C++
1
star
45

godeep

machine learning algorithms
1
star
46

deadlocks

deadlock code snippets in C
C
1
star
47

algebra

notes on algebra learning
1
star
48

log4go

Automatically exported from code.google.com/p/log4go
Go
1
star
49

debris

Shamir's Secret Sharing
1
star
50

ethereum_indexer

A project for indexing and querying ethereum accounts
Go
1
star