• Stars
    star
    384
  • Rank 111,726 (Top 3 %)
  • Language
    C#
  • License
    Other
  • Created over 5 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Low allocation async/await for C#/.NET

Documentation

What is this?

You know how async methods that await something incomplete end up creating a few objects, right? There's the boxed state machine, an Action that moves it forward, a Task[<T>], etc - right?

Well... what about if there just wasn't?

And what if all you had to do was change your async ValueTask<int> method to async PooledValueTask<int>?

And I hear you; you're saying "but I can't change the public API!". But what if a PooledValueTask<int> really was a ValueTask<int>? So you can just cheat:

public ValueTask<int> DoTheThing() // the outer method is not async
{
	return ReallyDoTheThing(this);
	static async PooledValueTask<int> ReallyDoTheThing(SomeType obj)
	{
		... await ...
		// (use obj.* instead of this.*)
		... return ...
	}
}

(the use of a static local function here avoids a <>c__DisplayClass wrapper from how the local-function capture context is implemented by the compiler)

And how about if maybe just maybe in the future it could be (if this happens) just:

[SomeKindOfAttribute] // <=== this is the only change
public async ValueTask<int> DoTheThing()
{
	// no changes here at all
}

(although note that in some cases it can work better with the static trick, as above)

Would that be awesome? Because that's what this is!

How does that work?

The PooledValueTask[<T>] etc exist mostly to define a custom builder. The builder in this library uses aggressive pooling of classes that replace the boxed approach used by default; we recycle them when the state machine completes.

It also makes use of the IValueTaskSource[<T>] API to allow incomplete operations to be represented without a Task[<T>], but with a custom backer. And we pool that too, recycling it when the task is awaited. The only downside: you can't await the same result twice now, because once you've awaited it the first time, it has gone. A cycling token is used to make sure you can't accidentally read the incorrect values after the result has been awaited.

We can even do this for Task[<T>], except here we can only avoid the boxed state machine; hence PooledTask[<T>] exists too. No custom backing in this case, though, since a Task[<T>] will need to be allocated (except for Task.CompletedTask, which we special-case).

Test results

Based on an operation that uses Task.Yield() to ensure that the operations are incomplete; ".NET" means the inbuilt out-of-the box implementation; "Pooled" means the implementation from this library.

In particular, notice:

  • zero allocations for PooledValueTask[<T>] vs ValueTask[<T>] (on .NET Core; significantly reduced on .NET Framework)
  • reduced allocations for PooledTask[<T>] vs Task[<T>]
  • no performance degredation; just lower allocations
| Method |  Job | Runtime |   Categories |     Mean |     Error |    StdDev |  Gen 0 |  Gen 1 |  Gen 2 | Allocated |
|------- |----- |-------- |------------- |---------:|----------:|----------:|-------:|-------:|-------:|----------:|
|   .NET |  Clr |     Clr |      Task<T> | 2.159 us | 0.0427 us | 0.0474 us | 0.0508 | 0.0039 |      - |     344 B |
| Pooled |  Clr |     Clr |      Task<T> | 2.037 us | 0.0246 us | 0.0230 us | 0.0273 | 0.0039 |      - |     182 B |
|   .NET | Core |    Core |      Task<T> | 1.397 us | 0.0024 us | 0.0022 us | 0.0176 |      - |      - |     120 B |
| Pooled | Core |    Core |      Task<T> | 1.349 us | 0.0058 us | 0.0054 us | 0.0098 |      - |      - |      72 B |
|        |      |         |              |          |           |           |        |        |        |           |
|   .NET |  Clr |     Clr |         Task | 2.065 us | 0.0200 us | 0.0167 us | 0.0508 | 0.0039 |      - |     336 B |
| Pooled |  Clr |     Clr |         Task | 1.979 us | 0.0179 us | 0.0167 us | 0.0273 | 0.0039 |      - |     182 B |
|   .NET | Core |    Core |         Task | 1.390 us | 0.0159 us | 0.0149 us | 0.0176 |      - |      - |     112 B |
| Pooled | Core |    Core |         Task | 1.361 us | 0.0055 us | 0.0051 us | 0.0098 |      - |      - |      72 B |
|        |      |         |              |          |           |           |        |        |        |           |
|   .NET |  Clr |     Clr | ValueTask<T> | 2.087 us | 0.0403 us | 0.0431 us | 0.0547 | 0.0078 | 0.0039 |     352 B |
| Pooled |  Clr |     Clr | ValueTask<T> | 1.924 us | 0.0248 us | 0.0220 us | 0.0137 | 0.0020 |      - |     100 B |
|   .NET | Core |    Core | ValueTask<T> | 1.405 us | 0.0078 us | 0.0073 us | 0.0195 |      - |      - |     128 B |
| Pooled | Core |    Core | ValueTask<T> | 1.374 us | 0.0116 us | 0.0109 us |      - |      - |      - |         - |
|        |      |         |              |          |           |           |        |        |        |           |
|   .NET |  Clr |     Clr |    ValueTask | 2.056 us | 0.0206 us | 0.0183 us | 0.0508 | 0.0039 |      - |     344 B |
| Pooled |  Clr |     Clr |    ValueTask | 1.948 us | 0.0388 us | 0.0416 us | 0.0137 | 0.0020 |      - |     100 B |
|   .NET | Core |    Core |    ValueTask | 1.408 us | 0.0140 us | 0.0117 us | 0.0176 |      - |      - |     120 B |
| Pooled | Core |    Core |    ValueTask | 1.366 us | 0.0039 us | 0.0034 us |      - |      - |      - |         - |

Note that most of the remaining allocations are actually the work-queue internals of Task.Yield() (i.e. how ThreadPool.QueueUserWorkItem works) - we've removed virtually all of the unnecessary overheads that came from the async machinery. Most real-world scenarios aren't using Task.Yield() - they are waiting on external data, etc - so they won't see these. Plus they are effectively zero on .NET Core 3.

The tests do the exact same thing; the only thing that changes is the return type, i.e. whether it is async Task<int>, async ValueTask<int>, async PooledTask<int> or async PooledValueTask<int>. All of them have the same threading/execution-context/sync-context semantics; there's no cheating going on.

More Repositories

1

fast-member

Automatically exported from code.google.com/p/fast-member
C#
995
star
2

Pipelines.Sockets.Unofficial

.NET managed sockets wrapper using the new "Pipelines" API
C#
412
star
3

Pipelines.WebSockets

.NET WebSocket (RFC 6455/hixie/hybi) implementation using the System.IO.Pipelines API
C#
69
star
4

DistributedCacheDemo

basic example of helper APIs for IDistributedCache
C#
51
star
5

RESPite

Low level RESP handling tools for .NET, intended for consumption by other libraries
C#
32
star
6

SimpleCUDAExample

A basic example of using ManagedCUDA via C# to execute logic on the GPU
C#
31
star
7

SortOfProblem

A demo project discussion sorting
C#
29
star
8

FASTERCache

IDistributedCache implementation on FASTER
C#
28
star
9

pipelines.docs

Open source documentation for the .NET pipelines project
17
star
10

Channels.Http2

Investigation into HTTP/2 (RFC7540) over Channels
C#
12
star
11

SimpleLINQ

Simplified wrapper for implementing custom (but simple) LINQ providers
C#
6
star
12

blog-preview

Things before they are written
C#
6
star
13

avro.net

Avro; get some
C#
4
star
14

RedisPing

Scratch area for playing with redis
C#
4
star
15

PlatformDynamicMethod

C#
4
star
16

OutputCacheDemo

asp.net demo using redis-based outputcache
C#
3
star
17

pigrostat

A basic hygrostat implementation for pi pico, sht30 and relay hat, using MicroPython
Python
3
star
18

ProtobufMemoryExploration

C#
2
star
19

dkclient

Prototype demikernal client; highly experimental
C#
1
star
20

JsilFun

Test package for JSIL
JavaScript
1
star
21

dapperson

1
star
22

DraftBlogs

In progress blog posts, mostly for review by folks
1
star
23

PickTock

Python
1
star