• Stars
    star
    808
  • Rank 56,429 (Top 2 %)
  • Language
    C++
  • License
    MIT License
  • Created over 3 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

asyncio is a c++20 library to write concurrent code using the async/await syntax.

asyncio

Asyncio is a C++20 coroutine library to write concurrent code using the await syntax, and imitate python asyncio library.

Build & Run

$ git clone --recursive https://github.com/netcan/asyncio.git
$ cd asyncio
$ mkdir build
$ cd build
$ cmake ..
$ make -j

Hello world

Task<> hello_world() {
    fmt::print("hello\n");
    co_await asyncio::sleep(1s);
    fmt::print("world\n");
}

int main() {
    asyncio::run(hello_world());
}

output:

hello
world

Dump callstack

Task<int> factorial(int n) {
    if (n <= 1) {
        co_await dump_callstack();
        co_return 1;
    }
    co_return (co_await factorial(n - 1)) * n;
}

int main() {
    fmt::print("run result: {}\n", asyncio::run(factorial(10)));
    return 0;
}

output:

[0] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:17
[1] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20
[2] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20
[3] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20
[4] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20
[5] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20
[6] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20
[7] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20
[8] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20
[9] void factorial(factorial(int)::_Z9factoriali.Frame*) at asyncio/test/st/hello_world.cpp:20

run result: 3628800

TCP Echo

Client

Task<> tcp_echo_client(std::string_view message) {
    auto stream = co_await asyncio::open_connection("127.0.0.1", 8888);

    fmt::print("Send: '{}'\n", message);
    co_await stream.write(Stream::Buffer(message.begin(), message.end()));

    auto data = co_await stream.read(100);
    fmt::print("Received: '{}'\n", data.data());

    fmt::print("Close the connection\n");
    stream.close(); // unneeded, just imitate python
}

int main(int argc, char** argv) {
    asyncio::run(tcp_echo_client("hello world!"));
    return 0;
}

output:

Send: 'hello world!'
Received: 'hello world!'
Close the connection

Server

Task<> handle_echo(Stream stream) {
    auto& sockinfo = stream.get_sock_info();
    auto sa = reinterpret_cast<const sockaddr*>(&sockinfo);
    char addr[INET6_ADDRSTRLEN] {};

    auto data = co_await stream.read(100);
    fmt::print("Received: '{}' from '{}:{}'\n", data.data(),
               inet_ntop(sockinfo.ss_family, get_in_addr(sa), addr, sizeof addr),
               get_in_port(sa));

    fmt::print("Send: '{}'\n", data.data());
    co_await stream.write(data);

    fmt::print("Close the connection\n");
    stream.close(); // optional, close connection early
}

Task<> echo_server() {
    auto server = co_await asyncio::start_server(
            handle_echo, "127.0.0.1", 8888);

    fmt::print("Serving on 127.0.0.1:8888\n");

    co_await server.serve_forever();
}

int main() {
    asyncio::run(echo_server());
    return 0;
}

output:

Serving on 127.0.0.1:8888
Received: 'Hello World!' from '127.0.0.1:49588'
Send: 'Hello World!'
Close the connection

Benchmark

Using the Apache Benchmarking tool, 10000000 requests that each size is 106 byte, 1000 concurrency, enable keepalive, the QPS/RPS result below:

framework RPS [#/sec] (mean) Language Pattern
python asyncio 47393.59 Python coroutine
python asyncio with uvloop 100426.97 Python coroutine
this project 164457.63 C++20 coroutine
asio 159322.66 C++20 coroutine
tokio-rs 156852.70 Rust1.59.0-nightly coroutine
epoll 153147.79 C eventloop
libevent 136996.46 C callback
libuv 159937.73 C callback

The result may be incredible, but it is possible, the magnitude of IO is milliseconds(1e-3 s), while the magnitude of the coroutine is nanoseconds(1e-9 s).

More detail see: benchmark.md

Gather

auto factorial(std::string_view name, int number) -> Task<int> {
    int r = 1;
    for (int i = 2; i <= number; ++i) {
        fmt::print("Task {}: Compute factorial({}), currently i={}...\n", name, number, i);
        co_await asyncio::sleep(500ms);
        r *= i;
    }
    fmt::print("Task {}: factorial({}) = {}\n", name, number, r);
    co_return r;
};

auto test_void_func() -> Task<> {
    fmt::print("this is a void value\n");
    co_return;
};

int main() {
    asyncio::run([&]() -> Task<> {
        auto&& [a, b, c, _void] = co_await asyncio::gather(
            factorial("A", 2),
            factorial("B", 3),
            factorial("C", 4),
            test_void_func());
        assert(a == 2);
        assert(b == 6);
        assert(c == 24);
    }());
}

output:

Task A: Compute factorial(2), currently i=2...
Task B: Compute factorial(3), currently i=2...
Task C: Compute factorial(4), currently i=2...
this is a void value
Task C: Compute factorial(4), currently i=3...
Task A: factorial(2) = 2
Task B: Compute factorial(3), currently i=3...
Task B: factorial(3) = 6
Task C: Compute factorial(4), currently i=4...
Task C: factorial(4) = 24

WaitFor

asyncio::run([&]() -> Task<> {
    REQUIRE_NOTHROW(co_await wait_for(gather(sleep(10ms), sleep(20ms), sleep(30ms)), 50ms));
    REQUIRE_THROWS_AS(co_await wait_for(gather(sleep(10ms), sleep(80ms), sleep(30ms)), 50ms),
                        TimeoutError);
}());

ScheduledTask & Cancel

auto say_after = [&](auto delay, std::string_view what) -> Task<> {
    co_await asyncio::sleep(delay);
    fmt::print("{}\n", what);
};

GIVEN("schedule sleep and cancel") {
    auto async_main = [&]() -> Task<> {
        auto task1 = schedule_task(say_after(100ms, "hello"));
        auto task2 = schedule_task(say_after(200ms, "world"));

        co_await task1;
        task2.cancel();
    };
    auto before_wait = get_event_loop().time();
    asyncio::run(async_main());
    auto after_wait = get_event_loop().time();
    auto diff = after_wait - before_wait;
    REQUIRE(diff >= 100ms);
    REQUIRE(diff < 200ms);
}

output:

hello

Tested Compiler

  • Debian Linux gcc-11/12, gcc-11 crash at Release mode

TODO

  • implement result type for code reuse, variant<monostate, value, exception>
  • implement coroutine backtrace(dump continuation chain)
  • implement some io coroutine(socket/read/write/close)
  • using libuv as backend

FAQ

Source:

How to handle the cancelled coroutine?

Q: technically, you can add a handle that doesn't exist in the event_loop queue. Would the cancelled event become a dangler in such a scenario?

void cancel_handle(Handle& handle) {
    cancelled_.insert(&handle);
}

A: it maybe memory leak at some scenario but it's safe, the cancelled set stores handle was destroyed, it notices eventloop when handle was readying, just skip it and remove from cancelled set prevent some memory leaks.

A: you are right, I find a bug at release mode when a handle is destroyed and inserted into the cancelled set, and then another coroutine is created, it has the same address as the destroyed coroutine handle!!! The loop will remove the new ready coroutine had created. fixed patch: https://github.com/netcan/asyncio/commit/23e6a38f5d00b55037f9560845c4e44948e41709

The coroutine performance and comparisons with other methods

Q: First off, great work! Do you have any suggestions for understanding when to use coroutines and when to not use them? They're too new to see what kind of performance they bring to the table, and I don't see much in terms of comparisons with other methods yet.

A: good question. for my point, the coroutine is just a syntax-sugar for callback, in other words, any scenario that requires callback interfaces can be replaced by the coroutine, a typical asynchronous programming pattern involves a lot of callbacks, so use coroutine the code is very readable than callback style.

for performance, coroutine is just a resumable function, it supports suspend and resume, I tested coroutine call/suspend/resume, it costs only a few tens of ns, compare to callback style programming, it has negative overhead than callback, more compare detail see: https://www.youtube.com/watch?v=_fu0gx-xseY, the author of coroutine.

Q: Thanks for that insight. I am an embedded developer, so most of my callbacks are hardware/interrupt driven. But I suppose I could create pseudo-interrupts with this if I wanted to run simulations on non-realtime hardware? I think I might just need to give them a shot to better understand.

A: you can think that pseudo interrupts handle just call a resume() of a coroutine handle after prepare some interrupt's data, then control flow will given to a point that wait(await) relative interrupt data and continue to process. Another scenario I found a good post for help understand that consider an embedded device that monitors data values, such as temperature, and writes these values to a serial port (RS232) complete with a timestamp, which could be time from device boot or a network synchronised clock time, https://blog.feabhas.com/2021/09/c20-coroutines/

Q: The experience so far seems that Gor was a bit too optimistic with that statements: in practice it seems to really depend whether the compiler manages to optimize away the coroutine, or not.

A: As far as I'm exploring, current compiler doesn't do HALO(Heap Allocation eLision Optimization,http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0981r0.html), so compare to normal function call, it costs heap allocation. but maybe coalescing caller stack memory if compiler proves that coroutine lifetime nests in caller future. but on the other side, a callback style needs programmer manually manage object lifetime between caller and callback, using shated_pointer(memory allocation), coroutine the object no needs manually memory allocation, it's in coroutine heap frame, so c++ idiom RAII works well.

Why needs some primitives(async_mutex/async_conditional_variable) even in the single threaded mode?

Q: I'm curious, could you share what these primitives(async_mutex, sync_wait) would do (I understand the point of when_all)?

A: To be able to create the whole application asynchronous we shouldn't block any thread from our thread pool. Traditional synchronization primitives that use os scheduler like std::mutex std::condition_variable are useless in such scenarios, we need this primitives to cooperate with internal application scheduler.

Q: Ah, ok. Then nevermind. I was curious for my coroutine library, but mine is pure single-thread.

A: You still need them even if you have just one thread (you just don't need to be concerned about concurrency problems) For example how you can wait for a condition and notify it from some where else in your library ? How you can join multiple asynchronous work?

A: these primitives are needs. such a game scenario, server must await to collect all commands of clients, then continue to do game logical, this needs condition_variable.

Why is the epoll version slower?

Q: Why is the epoll version slower? Shouldn't that have less overhead?

A: maybe c code version isn't effective, and test is undulate between +-5000 rps.

io_uring better than epoll

Q: > The result may be incredible, but it is possible, the magnitude of IO is milliseconds

Depends, really. With io_uring, or user-space network stacks, you get IO in the microseconds/nanoseconds range. The best ping-pong I have seen so far with software, measured first byte in the server, first byte out, was 1.2 microseconds. This includes: the network card receiving the bytes, the PCI bus transferring them to the CPU, the CPU reading the query and writing the response, the PCI bus transferring them to the network card, the network card sending the bytes.

The problem with I/O today is mostly syscall overhead, if you get syscalls out of the way -- io_uring being the most affordable alternative -- you can get an order of magnitude speed-up... or identify the next bottleneck in your code ;)

A: If I remember correctly, a system call is about 100ns(benchmark empty epoll_wait), but io_uring may better than epoll, I see other guy compare their.

Why is python asyncio so performant?

Q: Why is python asyncio so performant?

A: async program is IO bound.

A: A lot of very expensive people have optimised the snot out of python asyncio over years. It'll always be slower in the fixed overhead sense than C++, but in terms of scalability and corner case handling it ought to be close to optimal.

How to print the coroutine callstack?

Q: In one example you print the call stack. Am I correct in understanding that this is the "async call stack" as opposed to the traditional call stack? And if so how did you capture this info?

Im curious because this is something I've been thinking of implementing to aid debugging. Thanks.

A: yes, it's async callstack. the point is make use of await_transform() of coroutine promise_type, that save a coroutine source_location info, in other words, when user co_await, is save await location info.(

decltype(auto) await_transform(A&& awaiter, // for save source_location info
) dump backtrace is so simple, just recursive dump coroutine source_location and its continuation.

Will the buffer size of the benchmark code impact on performance?

Q: I'm just impressed by how readable the benchmark code looks compared to most other versions. And it seems like performance actually doesn't suffer that much from it. I wish the networking in the stdlib could look somewhat like this in practice. But it probably won't be generic enough for the committee...

EDIT: It looks like you are using different buffer sizes, is there a reason behind that?

A: the python and my project codes be written by me, others are gathered by Internet.

Q: It would probably make sense to benchmark equal buffer sizes, since that might have an impact on the requests per second?

A: I tested equal buffer sizes, with no impact on RPS.

Reference


Thanks to JetBrains for donating product licenses to help develop asyncio

More Repositories

1

compilingTheory

My course design for compiler theory (Visualization).
C++
284
star
2

config-loader

Simple C++ Config Loader Framework(Serialization & Reflection)
C++
211
star
3

Laravel_AJAX_CRUD

A Laravel application for AJAX CRUD page.
PHP
119
star
4

Talk

A simple Java talk software.
Java
113
star
5

recipes

There is my code snippet.
C++
109
star
6

AnimalChess

Animal Fight Chess Game(斗兽棋) written in rust.
Rust
90
star
7

2017-HUAWEI-Codecraft

2017华为软件精英挑战赛,上合赛区,围墙编队
C++
88
star
8

Leetcode-Rust

My solutions for leetcode by rust lang.
Rust
68
star
9

HFUT_ChemLab

A Laravel application for HFUT Chemistry lab learning and exam system.
PHP
66
star
10

SlidePuzzle

Slide puzzle game written by Netcan. using the SDL engine and use of A* algorithm.
C++
62
star
11

HFUT_Thesis

合肥工业大学毕业设计(论文)模板
TeX
59
star
12

baike_contest

合工大宣百科竞赛平台 By Netcan
PHP
56
star
13

MyCrawler

我的爬虫合集
Python
55
star
14

HFUT_Market

A databases course design, using python.
Python
53
star
15

advanced-cpp20-programming

机工社《高级C++20编程》随书代码
C++
52
star
16

meta-list

Intuitive & Powerful C++20 consteval metaprogramming library(via value).
C++
47
star
17

ChineseChess

ChinessChess game written by rust.
Rust
24
star
18

LinAlg

实现一个线性代数库,为Python写扩展。《程序猿的数学3 线性代数》读后笔记
C++
20
star
19

NetcanOS

Netcan OS is an operation system for x86 PCs, for learning how os works.
C
13
star
20

gameOfLife

game of life using Javascript and canvas.
JavaScript
9
star
21

2018-HUAWEI-Codecraft

2018华为软件精英挑战赛。
C++
6
star
22

wordStatistics

A tool use for figuring word's frequency.
C++
5
star
23

FCEmulator

There is my graduation project, a FC Emulator.
C++
4
star
24

LeetCode

My Code for leetcode
C++
4
star
25

2017-HIKVISION-CodeChallenge

2017海康威视软件精英挑战赛
C++
3
star
26

qbittorrent-web-control

qbittorrent-web-control, developing under the guidance of AI.
TypeScript
3
star
27

Netcan_ICPC

The Project is Netcan's Programming Contest Experience
C++
3
star
28

LoveCalendar

A calendar to mark something.
Python
3
star
29

HFUT_Live

校内直播平台。
PHP
3
star
30

nano-caf

C++
2
star
31

presentation

My presentation collections for talks.
HTML
2
star
32

netcan

self profile
Python
2
star
33

boost-cmake

CMake
2
star
34

Tictactoe

A Python game, for network programming.
Python
2
star
35

netcan.github.io

My blog for something about programming.
SCSS
2
star
36

HFUT_LabReport

合肥工业大学LaTeX模板
TeX
2
star
37

compute-3d

Exploring rendering techniques for 3D graphics
C++
2
star
38

libnetcan-soft.so

Written in junior high school. on birthday 2012-05-23
C
1
star
39

TianMen_Weather

天门气象公众号开发。
Python
1
star
40

computerOrganization

计算机组成原理课设
Assembly
1
star