• Stars
    star
    179
  • Rank 214,039 (Top 5 %)
  • Language
    C++
  • License
    MIT License
  • Created almost 9 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An implementation of Seqlock in C++11

Seqlock.h

Build Status License

Implementation of seqlock in C++11.

A seqlock can be used as an alternative to a readers-writer lock. It will never block the writer and doesn't require any memory bus locks.

Example

struct Data {
  std::size_t a, b, c;
};
Seqlock<Data> sl;
sl.store({0, 0, 0});
auto t = std::thread([&] {
  for (;;) {
    auto d = sl.load();
    if (d.a + 100 == d.b && d.c == d.a + d.b) {
      return;
    }
  }
});
sl.store({100, 200, 300});
t.join();

Usage

Create a seqlock:

struct Data {
  std::size_t a, b, c;
};
Seqlock<Data> sl;

Store a value (can only be called from a single thread):

sl.store({1, 2, 3});

Load a value (can be called from multiple threads):

auto v = sl.load();

Implementation

Implementing the seqlock in portable C++11 is quite tricky. The basic seqlock implementation unconditionally loads the sequence number, then unconditionally loads the protected data and finally unconditionally loads the sequence number again. Since loading the protected data is done unconditionally on the sequence number the compiler is free to move these loads before or after the loads from the sequence number.

T load() const noexcept {
  T copy;
  size_t seq0, seq1;
  do {
    seq0 = seq_.load(std::memory_order_acquire);
    copy = value_;
    seq1 = seq_.load(std::memory_order_acquire);
  } while (seq0 != seq1 || seq0 & 1);
  return copy;
}

Compiling this code specialized for int with g++-5.2 -std=c++11 -O3 yields the following assembly:

0000000000401ad0 <_ZNK7rigtorp7SeqlockIiLm64EE4loadEv>:
  // copy = value_;
  401ad0:	8b 47 08             	mov    0x8(%rdi),%eax
  401ad3:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  // do {
  //   seq0 = seq_.load(std::memory_order_acquire);
  401ad8:	48 8b 0f             	mov    (%rdi),%rcx
  //   seq1 = seq_.load(std::memory_order_acquire);
  401adb:	48 8b 17             	mov    (%rdi),%rdx
  // } while (seq0 != seq1 || seq0 & 1);
  401ade:	48 39 ca             	cmp    %rcx,%rdx
  401ae1:	75 f5                	jne    401ad8 <_ZNK7rigtorp7SeqlockIiLm64EE4loadEv+0x8>
  401ae3:	83 e2 01             	and    $0x1,%edx
  401ae6:	75 f0                	jne    401ad8 <_ZNK7rigtorp7SeqlockIiLm64EE4loadEv+0x8>
  // return copy;
  401ae8:	f3 c3                	repz retq 
  401aea:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)

We see that the compiler did indeed reorder the load of the protected data outside the critical section and the data is no longer protected from torn reads. Interestingly compiling using clang++-3.7 -std=c++11 -O3 produces the correct assembly:

0000000000401520 <_ZNK7rigtorp7SeqlockIiLm64EE4loadEv>:
  // do {
  //   seq0 = seq_.load(std::memory_order_acquire);  
  401520:	48 8b 0f             	mov    (%rdi),%rcx
  //   copy = value_;
  401523:	8b 47 08             	mov    0x8(%rdi),%eax
  //   seq1 = seq_.load(std::memory_order_acquire);
  401526:	48 8b 17             	mov    (%rdi),%rdx
  // } while (seq0 != seq1 || seq0 & 1);
  401529:	f6 c1 01             	test   $0x1,%cl
  40152c:	75 f2                	jne    401520 <_ZNK7rigtorp7SeqlockIiLm64EE4loadEv>
  40152e:	48 39 d1             	cmp    %rdx,%rcx
  401531:	75 ed                	jne    401520 <_ZNK7rigtorp7SeqlockIiLm64EE4loadEv>
  // return copy;
  401533:	c3                   	retq   
  401534:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
  40153b:	00 00 00 
  40153e:	66 90                	xchg   %ax,%ax

There are two ways to fix this:

  • Make the location of the protected data dependent on the sequence number by storing multiple instances of the data and selecting which one to read from based on the sequence number. This solution should be portable to all CPU architectures, but requires extra space.
  • For x86 it's enough to insert a compiler barrier using std::atomic_signal_fence(std::memory_order_acq_rel). This will only work on the x86 memory model. On ARM memory model you need to inserts a dmb memory barrier instruction, which is not possible in C++11.

Since my target architecture is x86 I've implemented the second option:

T load() const noexcept {
  T copy;
  size_t seq0, seq1;
  do {
    seq0 = seq_.load(std::memory_order_acquire);
    std::atomic_signal_fence(std::memory_order_acq_rel);
    copy = value_;
    std::atomic_signal_fence(std::memory_order_acq_rel);
    seq1 = seq_.load(std::memory_order_acquire);
  } while (seq0 != seq1 || seq0 & 1);
  return copy;
}

Compiled with g++-5.2 -std=c++11 -O3 it produces the following correct assembly:

00000000004014e0 <_ZNK7rigtorp7SeqlockIiE4loadEv>:
  4014e0:	48 8d 4f 08          	lea    0x8(%rdi),%rcx
  4014e4:	0f 1f 40 00          	nopl   0x0(%rax)
  // do {
  //   seq0 = seq_.load(std::memory_order_acquire);
  //   std::atomic_signal_fence(std::memory_order_acq_rel);
  4014e8:	48 8b 31             	mov    (%rcx),%rsi
  //   copy = value_;
  4014eb:	8b 07                	mov    (%rdi),%eax
  //   std::atomic_signal_fence(std::memory_order_acq_rel);
  //   seq1 = seq_.load(std::memory_order_acquire);
  4014ed:	48 8b 11             	mov    (%rcx),%rdx
  // } while (seq0 != seq1 || seq0 & 1);
  4014f0:	48 39 f2             	cmp    %rsi,%rdx
  4014f3:	75 f3                	jne    4014e8 <_ZNK7rigtorp7SeqlockIiE4loadEv+0x8>
  4014f5:	83 e2 01             	and    $0x1,%edx
  4014f8:	75 ee                	jne    4014e8 <_ZNK7rigtorp7SeqlockIiE4loadEv+0x8>
  // return copy;
  4014fa:	f3 c3                	repz retq 
  4014fc:	0f 1f 40 00          	nopl   0x0(%rax)

The store operation is implemented in a similar manner to the load operation. Additionally the data and sequence counter is aligned and padded to prevent false sharing with adjacent data.

References:

Testing

Testing lock-free algorithms is hard. I'm using two approaches to test the implementation:

  • A test that load() and store() publish results in the correct order on a single thread.
  • A multithreaded fuzz test that load() never see a partial store() (torn read).

Potential improvements

Allow partial updates and reads using a visitor pattern.

Support for multiple writers can be achieved by using a CAS loop to acquire the odd sequence number in the store operation. This would have the same effect as wrapping the seqlock in a spinlock.

Trade-off space for reduced readers-writer contention by striping writes across multiple seqlocks.

About

This project was created by Erik Rigtorp <[email protected]>.

More Repositories

1

awesome-modern-cpp

A collection of resources on modern C++
HTML
11,654
star
2

awesome-lockfree

A collection of resources on wait-free and lock-free programming
1,752
star
3

MPMCQueue

A bounded multi-producer multi-consumer concurrent queue written in C++11
C++
1,160
star
4

SPSCQueue

A bounded single-producer single-consumer wait-free and lock-free queue written in C++11
C++
871
star
5

ipc-bench

Latency benchmarks of Unix IPC mechanisms
C
547
star
6

spartan

A collection of High-Frequency trading components
C++
270
star
7

udpreplay

Replay UDP packets from a pcap file
C++
256
star
8

HashMap

An open addressing linear probing hash table, tuned for delete heavy workloads
C++
197
star
9

nanomq

Ultra low latency messaging kernel
C++
152
star
10

c2clat

A tool to measure CPU core to core latency
C++
118
star
11

hiccups

Measures the system induced jitter ("hiccups") a CPU bound thread experiences
C++
104
star
12

efvicap

erfvicap is a packet capture tool for network adapters from Solarflare
C++
59
star
13

statkit

Statistics toolkit for JavaScript
JavaScript
51
star
14

TokenBucket

Lock-free implementation of the token bucket algorithm in C++
C++
50
star
15

Function

Heap allocation free version of C++11 std::function
C++
50
star
16

isatomic

Test if AVX vector loads and stores are atomic
C++
21
star
17

CharConv

Fast integer to string and string to integer conversion functions
C++
18
star
18

BinarySemaphore

Binary semaphore using futexes.
C++
9
star
19

RingMap

Hybrid data structure that acts like a ring buffer and map
C++
7
star
20

dotemacs

My emacs config
Emacs Lisp
6
star
21

go-pikchr

Pikchr wrapped for Go using WebAssembly. No cgo required.
Go
4
star
22

unats

A simple single threaded NATS client for C++.
C++
4
star
23

nordpool

Data preprocessing / munging scripts for Nordpool power market data
HTML
3
star
24

openonload

git import of https://www.openonload.org/
Shell
3
star
25

sfpl

Simple functional programming language
Python
2
star
26

go-graphviz

Graphviz wrapped for Go using WebAssembly. No cgo required.
CMake
2
star
27

spark-dht11

Spark app to read a DHT11 sensor and publish humidity and temperature as Spark variables
C++
2
star
28

gmbackup

Simple tool to backup a Gmail account
Go
2
star
29

t-amp

Tripath Class-T audio amplifier
1
star
30

anything-git

Anything sources for Git
Emacs Lisp
1
star
31

chipamp

A hand crafted power amplifier of the Gainclone type
1
star
32

sysjitter2

1
star
33

gccsense

Fork of http://cx4a.org/repo/gccsense.git
Emacs Lisp
1
star