• Stars
    star
    168
  • Rank 225,507 (Top 5 %)
  • Language
    TypeScript
  • Created over 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An ultra-high performance stream reader for browser and Node.js

QuickReader

An ultra-high performance stream reader for browser and Node.js, easy-to-use, zero dependency.

NPM Version NPM Install Size GitHub CI

Install

npm i quickreader

Demo

import {QuickReader, A} from 'quickreader'

const res = await fetch('https://unpkg.com/quickreader-demo/demo.bin')
const stream = res.body   // ReadableStream
const reader = new QuickReader(stream)

do {
  const id   = reader.u32() ?? await A
  const name = reader.txt() ?? await A
  const age  = reader.u8()  ?? await A

  console.log(id, name, age)

} while (!reader.eof)

https://jsbin.com/loyuxad/edit?html,console

With a stream reader, you can read the data in the specified types while downloading, which makes the user experience better. You don't have to do the chunk slicing and buffering yourself, the reader does all that.

Without it, you would have to wait for all the data to be downloaded before you could read it (e.g., via DataView). Since JS doesn't support structures, you have to pass in an offset parameter for each read, which is inconvenient to use.

Why Quick

We used two tricks to improve performance:

  • selective await

  • synchronized EOF

selective await

The overhead of await is considerable, here is a test:

let s1 = 0, s2 = 0

console.time('no-await')
for (let i = 0; i < 1e7; i++) {
  s1 += i
}
console.timeEnd('no-await')   // ~15ms

console.time('await')
for (let i = 0; i < 1e7; i++) {
  s2 += await i
}
console.timeEnd('await')      // Chrome: ~800ms, Safari: ~3000ms

https://jsbin.com/gehazin/edit?html,output

The above two cases do the same thing, but the await one is 50x to 200x slower than the no-await. On Chrome it's even ~2000x slower if the console is open (only ~500 await/ms).

This test seems meaningless, but in fact, sometimes we call await heavily in an almost synchronous logic, such as an async query function that will mostly hit the memory cache and return.

async function query(key) {
  if (cacheMap.has(key)) {
    return ...  // 99.9%
  }
  await ...
}

Reading data from a stream has the same issue. For a single integer or tiny text, it takes only a few bytes, in most cases, it can be read directly from the buffer without I/O calls, so await is unnecessary; await is only needed when the buffer is not enough.

If await is called only when needed, the overhead can be reduced many times.

console.time('selective-await')
for (let i = 0; i < 1e7; i++) {
  const value = (i % 1000)  // buffer enough?
    ? i         // 99.9%
    : await i   //  0.1%
}
console.timeEnd('selective-await')  // ~40ms 🚀

For QuickReader, when its buffer is enough, it returns the result immediately; otherwise, it returns nothing (undefined), and the result can be obtained by await A.

function readBytes(len) {
  if (len < availableLen) {
    return buf.subarray(offset, offset + len)   // likely
  }
  A = readBytesAsync(len)
}

async function readBytesAsync(len) {
  await stream.read()
  ...
}

The calling logic can be simplified into one line using the nullish coalescing operator:

result = readBytes(10) ?? await A

This is both high-performance and easy-to-use.

Note: The A is not a global variable in the real code, it's just an imported object that implements thenable, you can rename it on import.

synchronized EOF

QuickReader always keeps its buffer at least 1 byte, this means when the available buffer length is 4 and call reader.u32(), the result will not be returned immediately, but requires await. The buffer will not be fully read until the stream is closed.

In this way, the EOF state can be detected synchronously, nearly zero overhead.

Note: If the data must be fully read before continuing (e.g. reading a handshake command from a socket stream and replying), do not use QuickReader, otherwise it will wait forever. QuickReader assumes that data can be read all the time, e.g. a file stream.

Node.js

NodeStream is also supported:

const stream = fs.createReadStream('/path/to')
const reader = new QuickReader(stream)
// ...

If the stream provides data as Buffer, buffer-related methods like bytes, bytesTo, etc. return Buffer as well, otherwise they return Uint8Array.

Buffer is a subclass of Uint8Array.

API

Class

  • QuickReader<T extends Uint8Array = Uint8Array>

Constructor

  • new(stream: AsyncIterable<T> | ReadableStream<T>)

By length

  • bytes(len: number) : T | undefined

  • skip(len: number) : number | undefined

  • txtNum(len: number) : string | undefined

By delimiter

  • bytesTo(delim: number) : T | undefined

  • skipTo(delim: number) : number | undefined

  • txtTo(delim: number) : string | undefined

Helper

  • txt() : string - Equivalent to txtTo(0).

  • txtLn() : string - Equivalent to txtTo(10).

Number

  • {u, i}{8, 16, 32, 16be, 32be}() : number | undefined

  • {u, i}{64, 64be}() : bigint | undefined

  • f{32, 64, 32be, 64be}() : number | undefined

Chunk

  • chunk(): Promise<T>

  • chunks(len: number) : AsyncGenerator<T>

  • chunksToEnd(len: number) : AsyncGenerator<T>

Property

  • eof: boolean

  • eofAsDelim: boolean (default false)

More

See index.d.ts

Note: The program will not check the parameter len, but converts it to u32, which may cause negative values become large numbers. So the caller needs to ensure that len >= 0. Similarly, the parameter delim will not be checked, but converted to u8.

EOF State

Since the eof property is synchronized, it's meaningless until the first read.

If the stream is expected to be non-empty, you can read before detecting:

const reader = new QuickReader(stream)
do {
  const line = reader.txtLn() ?? await A
} while (!reader.eof)

However, an error will be thrown when the stream is empty.

If the empty stream needs to be considered, you can call pull method first, then the eof will be meaningful:

const reader = new QuickReader(stream)
await reader.pull()

while (!reader.eof) {
  const line = reader.txtLn() ?? await A
}

In this way, no error will be thrown if the stream is empty.

Buffer Type

The generic type T can be used as a type hint for the buffer:

class QuickReader<T extends Uint8Array = Uint8Array> {

  new(stream: AsyncIterable<T> | ReadableStream<T>)

  public bytes(len: number) : T | undefined
  public bytesTo(delim: number) : T | undefined

  public chunk() : Promise<T>
  public chunks(len: number) : AsyncGenerator<T>
  public chunksToEnd(len: number) : AsyncGenerator<T>
}

T is based on the type of the construction parameter stream, if it is not clear, you need to specify T manually:

{
  const reader = new QuickReader<Buffer>(fs.createReadStream('/path/to'))
  const buffer = reader.bytes(10) ?? await A
  buffer  // Type Hint: Buffer
}
{
  const reader = new QuickReader<Uint8Array>(getStreamSomeHow())
  const buffer = reader.bytes(10) ?? await A
  buffer  // Type Hint: Uint8Array
}

When T is explicit, buffer-related methods can get the expected return type hint.

Chunk Reading

chunk

When processing a large file, after reading the header, sometimes we want to read the remaining data chunk by chunk instead of all at once. In this case, we can use the chunk method:

const header = reader.byte(10) ?? await A
do {
  const chunk = await reader.chunk()
  // ...
} while (!reader.eof)

Once this method is called, the amount of remaining data will be unpredictable, you can only call this method repeatedly.

chunks

Using the chunks method, you can specify the read length:

const ver = reader.u32() ?? await A
const len = reader.u32() ?? await A

for await (const chunk of reader.chunks(len)) {
  // ...
}

const trailer = reader.bytes(8) ?? await A

In this way, after reading some chunks, you can continue to read data with types.

During iteration, calling other methods to read data is not allowed.

chunksToEnd

This method will read all data until len bytes remaining, so that the trailer can be excluded.

const header = reader.bytes(10) ?? await A

for await (const chunk of reader.chunksToEnd(8)) {
  // ...
}

const trailer = reader.bytes(8) ?? await A

After calling this method, the stream is closed, and the buffer has len bytes.

If len is 0, it is similar to calling chunk method repeatedly.

Type Check

It is better to use TypeScript. When you forget to add ?? await A, the type of result will be unioned with undefined, which makes it easier to expose the issue.

const id = reader.u32()   // number | undefined
id.toString()             // ❌

Concurrency

The same reader is not allowed to be called by multiple co-routines in parallel, as this would break the waiting order. Therefore, the following logic should not be used:

const reader = new QuickReader(stream)

async function routine() {
  do {
    const id = reader.u32() ?? await A
    const name = reader.txt() ?? await A
    // ...
  } while (!reader.eof)
}

// ❌
for (let i = 0; i < 10; i++) {
  routine()
}

Read Line

QuickReader is also a high performance line reader. It reduces the overhead by ~60% compared to the Node.js' native readline module, because its parsing logic is simpler, e.g. using only \n delimiter (ignoring \r).

const stream = fs.createReadStream('log.txt')
const reader = new QuickReader(stream)
await reader.pull()

// no error if the file does not end with '\n'
reader.eofAsDelim = true

while (!reader.eof) {
  const line = reader.txtLn() ?? await A
  // ...
}

Of course, as mentioned above, concurrency is not supported. If there are multiple co-routines reading the same file, it is better to use the native readline module:

import fs from 'node:fs'
import readline from 'node:readline'

const stream = fs.createReadStream('urls.txt')
const rl = readline.createInterface({input: stream})
const iter = rl[Symbol.asyncIterator]()

async function routine() {
  for (;;) {
    const {value: url} = await iter.next()
    if (!url) { 
      break
    }
    const res = await fetch(url)
    // ...
  }
}

for (let i = 0; i < 100; i++) {
  routine()
}

About

The idea of this project was born when the await keyword was introduced. The earliest solution was:

const result = reader.read() || await A

Since the || operator will also short-circuit 0 and '', so it was not perfect, until the ?? was introduced in ES2020.

However, the performance of await had been greatly improved compared to the past, so it was not as meaningful as it was then. Anyway, I still share this idea, even if it is 2022 now, after all, performance optimization is never-ending.

Due to limited time and English, the document and some code comments (e.g. index.d.ts) were translated via Google, hopefully someone will improve it.

License

MIT

More Repositories

1

jsproxy

An online proxy based on ServiceWorker
Shell
9,309
star
2

web2img

Bundle web files into a single image
JavaScript
1,267
star
3

freecdn

A front-end CDN based on ServiceWorker
JavaScript
980
star
4

web-frontend-magic

#前端黑魔法# 整理
783
star
5

http-server-online

Start a local HTTP server without any tools, just open a web page.
JavaScript
629
star
6

jsproxy-browser

jsproxy 浏览器端程序
JavaScript
286
star
7

js-port-knocking

Web 端口敲门的奇思妙想
HTML
237
star
8

myppt

189
star
9

headphone-morse-transmitter

Send Morse code via ⏸️ ⏯️ ⏮️
HTML
177
star
10

WebScrypt

a fast and lightweight scrypt hash algorithm for browser
JavaScript
152
star
11

decent-cdn

网站 CDN 去中心化尝试
JavaScript
130
star
12

https_hijack_demo

HTTPS Frontend Hijack
JavaScript
92
star
13

mitm-http-cache-poisoning

HTTP Cache Poisoning Demo
JavaScript
87
star
14

sw-sec

Service Worker 安全探索
HTML
73
star
15

str2gbk

JS 字符串转 GBK 编码超轻量实现
JavaScript
53
star
16

toh

TCP over HTTP. 隐藏网站的管理服务
JavaScript
43
star
17

cookie_hijack_demo

MITM cookie sniffer
JavaScript
35
star
18

timelock

A time-lock puzzle designed for browsers
TypeScript
27
star
19

freecdn-js

freecdn browser side files
TypeScript
26
star
20

xss_ghost

XSS 跨窗口劫持
JavaScript
20
star
21

proof-of-work-hashcash

Proof-of-Work Hashcash demo
JavaScript
17
star
22

gzip-js-injector

GZIP 页面零开销注入 JS 演示
JavaScript
14
star
23

6502-to-js-test

HTML
9
star
24

mini-program-pack

小程序静态资源打包和压缩
JavaScript
9
star
25

freecdn-publib

public lib database for freecdn
JavaScript
6
star
26

freecdn-update-svc

TypeScript
6
star
27

test

just test
JavaScript
4
star
28

jsproxy-bin

3
star
29

js-anti-ddos

使用浏览器端 JS 防御 DDOS 攻击
3
star
30

jsproxy-localtest

HTML
3
star
31

etherdream.github.io

2
star
32

setupsw

2
star
33

parallel-pbkdf2

The best password hashing solution in browser
HTML
2
star
34

cloudshellproxy

1
star
35

free-host-test

JavaScript
1
star
36

webworker.swc

HTML5-style Worker For Flash
ActionScript
1
star