• This repository has been archived on 06/Nov/2022
  • Stars
    star
    6,223
  • Rank 6,436 (Top 0.2 %)
  • Language
    C
  • License
    MIT License
  • Created over 15 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

http request/response parser for c

HTTP Parser

http-parser is not actively maintained. New projects and projects looking to migrate should consider llhttp.

Build Status

This is a parser for HTTP messages written in C. It parses both requests and responses. The parser is designed to be used in performance HTTP applications. It does not make any syscalls nor allocations, it does not buffer data, it can be interrupted at anytime. Depending on your architecture, it only requires about 40 bytes of data per message stream (in a web server that is per connection).

Features:

  • No dependencies
  • Handles persistent streams (keep-alive).
  • Decodes chunked encoding.
  • Upgrade support
  • Defends against buffer overflow attacks.

The parser extracts the following information from HTTP messages:

  • Header fields and values
  • Content-Length
  • Request method
  • Response status code
  • Transfer-Encoding
  • HTTP version
  • Request URL
  • Message body

Usage

One http_parser object is used per TCP connection. Initialize the struct using http_parser_init() and set the callbacks. That might look something like this for a request parser:

http_parser_settings settings;
settings.on_url = my_url_callback;
settings.on_header_field = my_header_field_callback;
/* ... */

http_parser *parser = malloc(sizeof(http_parser));
http_parser_init(parser, HTTP_REQUEST);
parser->data = my_socket;

When data is received on the socket execute the parser and check for errors.

size_t len = 80*1024, nparsed;
char buf[len];
ssize_t recved;

recved = recv(fd, buf, len, 0);

if (recved < 0) {
  /* Handle error. */
}

/* Start up / continue the parser.
 * Note we pass recved==0 to signal that EOF has been received.
 */
nparsed = http_parser_execute(parser, &settings, buf, recved);

if (parser->upgrade) {
  /* handle new protocol */
} else if (nparsed != recved) {
  /* Handle error. Usually just close the connection. */
}

http_parser needs to know where the end of the stream is. For example, sometimes servers send responses without Content-Length and expect the client to consume input (for the body) until EOF. To tell http_parser about EOF, give 0 as the fourth parameter to http_parser_execute(). Callbacks and errors can still be encountered during an EOF, so one must still be prepared to receive them.

Scalar valued message information such as status_code, method, and the HTTP version are stored in the parser structure. This data is only temporally stored in http_parser and gets reset on each new message. If this information is needed later, copy it out of the structure during the headers_complete callback.

The parser decodes the transfer-encoding for both requests and responses transparently. That is, a chunked encoding is decoded before being sent to the on_body callback.

The Special Problem of Upgrade

http_parser supports upgrading the connection to a different protocol. An increasingly common example of this is the WebSocket protocol which sends a request like

    GET /demo HTTP/1.1
    Upgrade: WebSocket
    Connection: Upgrade
    Host: example.com
    Origin: http://example.com
    WebSocket-Protocol: sample

followed by non-HTTP data.

(See RFC6455 for more information the WebSocket protocol.)

To support this, the parser will treat this as a normal HTTP message without a body, issuing both on_headers_complete and on_message_complete callbacks. However http_parser_execute() will stop parsing at the end of the headers and return.

The user is expected to check if parser->upgrade has been set to 1 after http_parser_execute() returns. Non-HTTP data begins at the buffer supplied offset by the return value of http_parser_execute().

Callbacks

During the http_parser_execute() call, the callbacks set in http_parser_settings will be executed. The parser maintains state and never looks behind, so buffering the data is not necessary. If you need to save certain data for later usage, you can do that from the callbacks.

There are two types of callbacks:

  • notification typedef int (*http_cb) (http_parser*); Callbacks: on_message_begin, on_headers_complete, on_message_complete.
  • data typedef int (*http_data_cb) (http_parser*, const char *at, size_t length); Callbacks: (requests only) on_url, (common) on_header_field, on_header_value, on_body;

Callbacks must return 0 on success. Returning a non-zero value indicates error to the parser, making it exit immediately.

For cases where it is necessary to pass local information to/from a callback, the http_parser object's data field can be used. An example of such a case is when using threads to handle a socket connection, parse a request, and then give a response over that socket. By instantiation of a thread-local struct containing relevant data (e.g. accepted socket, allocated memory for callbacks to write into, etc), a parser's callbacks are able to communicate data between the scope of the thread and the scope of the callback in a threadsafe manner. This allows http_parser to be used in multi-threaded contexts.

Example:

 typedef struct {
  socket_t sock;
  void* buffer;
  int buf_len;
 } custom_data_t;


int my_url_callback(http_parser* parser, const char *at, size_t length) {
  /* access to thread local custom_data_t struct.
  Use this access save parsed data for later use into thread local
  buffer, or communicate over socket
  */
  parser->data;
  ...
  return 0;
}

...

void http_parser_thread(socket_t sock) {
 int nparsed = 0;
 /* allocate memory for user data */
 custom_data_t *my_data = malloc(sizeof(custom_data_t));

 /* some information for use by callbacks.
 * achieves thread -> callback information flow */
 my_data->sock = sock;

 /* instantiate a thread-local parser */
 http_parser *parser = malloc(sizeof(http_parser));
 http_parser_init(parser, HTTP_REQUEST); /* initialise parser */
 /* this custom data reference is accessible through the reference to the
 parser supplied to callback functions */
 parser->data = my_data;

 http_parser_settings settings; /* set up callbacks */
 settings.on_url = my_url_callback;

 /* execute parser */
 nparsed = http_parser_execute(parser, &settings, buf, recved);

 ...
 /* parsed information copied from callback.
 can now perform action on data copied into thread-local memory from callbacks.
 achieves callback -> thread information flow */
 my_data->buffer;
 ...
}

In case you parse HTTP message in chunks (i.e. read() request line from socket, parse, read half headers, parse, etc) your data callbacks may be called more than once. http_parser guarantees that data pointer is only valid for the lifetime of callback. You can also read() into a heap allocated buffer to avoid copying memory around if this fits your application.

Reading headers may be a tricky task if you read/parse headers partially. Basically, you need to remember whether last header callback was field or value and apply the following logic:

(on_header_field and on_header_value shortened to on_h_*)
 ------------------------ ------------ --------------------------------------------
| State (prev. callback) | Callback   | Description/action                         |
 ------------------------ ------------ --------------------------------------------
| nothing (first call)   | on_h_field | Allocate new buffer and copy callback data |
|                        |            | into it                                    |
 ------------------------ ------------ --------------------------------------------
| value                  | on_h_field | New header started.                        |
|                        |            | Copy current name,value buffers to headers |
|                        |            | list and allocate new buffer for new name  |
 ------------------------ ------------ --------------------------------------------
| field                  | on_h_field | Previous name continues. Reallocate name   |
|                        |            | buffer and append callback data to it      |
 ------------------------ ------------ --------------------------------------------
| field                  | on_h_value | Value for current header started. Allocate |
|                        |            | new buffer and copy callback data to it    |
 ------------------------ ------------ --------------------------------------------
| value                  | on_h_value | Value continues. Reallocate value buffer   |
|                        |            | and append callback data to it             |
 ------------------------ ------------ --------------------------------------------

Parsing URLs

A simplistic zero-copy URL parser is provided as http_parser_parse_url(). Users of this library may wish to use it to parse URLs constructed from consecutive on_url callbacks.

See examples of reading in headers:

More Repositories

1

node

Node.js JavaScript runtime ✨🐢🚀✨
JavaScript
97,973
star
2

node-v0.x-archive

Moved to https://github.com/nodejs/node
34,533
star
3

node-gyp

Node.js native addon build tool
Python
9,275
star
4

docker-node

Official Docker Image for Node.js 🐳 🐢 🚀
Dockerfile
7,872
star
5

undici

An HTTP/1.1 client, written from scratch for Node.js
JavaScript
6,182
star
6

nodejs.org

The Node.js® Website
TypeScript
6,020
star
7

Release

Node.js Release Working Group
3,803
star
8

nan

Native Abstractions for Node.js
C++
3,277
star
9

corepack

Zero-runtime-dependency package acting as bridge between Node projects and their package managers
TypeScript
2,542
star
10

node-addon-examples

Node.js C++ addon examples from http://nodejs.org/docs/latest/api/addons.html
C++
2,332
star
11

nodejs.dev

A redesign of Nodejs.org built using Gatsby.js with React.js, TypeScript, and Remark.
TypeScript
2,293
star
12

node-addon-api

Module for using Node-API from C++
C++
2,162
star
13

node-chakracore

Node.js on ChakraCore ✨🐢🚀✨
JavaScript
1,921
star
14

node-convergence-archive

Archive for node/io.js convergence work pre-3.0.0
JavaScript
1,837
star
15

llhttp

Port of http_parser to llparse
TypeScript
1,665
star
16

help

✨ Need help with Node.js? File an Issue here. 🚀
1,473
star
17

llnode

An lldb plugin for Node.js and V8, which enables inspection of JavaScript states for insights into Node.js processes and their core dumps.
C++
1,151
star
18

readable-stream

Node-core streams for userland
JavaScript
1,003
star
19

examples

A repository of runnable Node.js examples that go beyond "hello, world!"
JavaScript
652
star
20

TSC

The Node.js Technical Steering Committee
JavaScript
592
star
21

llparse

Generating parsers in LLVM IR
TypeScript
586
star
22

mentorship

Node.js Mentorship Program Initiative
585
star
23

citgm

Canary in the Gold Mine
JavaScript
567
star
24

http2

Working on an HTTP/2 implementation for Node.js Core
JavaScript
520
star
25

diagnostics

Node.js Diagnostics Working Group
513
star
26

security-wg

Node.js Ecosystem Security Working Group
JavaScript
495
star
27

next-10

Repository for discussion on strategic directions for next 10 years of Node.js
480
star
28

build

Better build and test infra for Node.
Shell
469
star
29

node-eps

Node.js Enhancement Proposals for discussion on future API additions/changes to Node core
442
star
30

education

A place to discover and contribute to education initiatives in Node.js
417
star
31

node-v8

Experimental Node.js mirror on V8 lkgr ✨🐢🚀✨
Shell
416
star
32

modules

Node.js Modules Team
413
star
33

package-maintenance

Repository for work for discussion of helping with maintenance of key packages in the ecosystem.
407
star
34

nodejs-zh-CN

node.js 中文化 & 中文社区
SCSS
395
star
35

performance

Node.js team focusing on performance
Shell
376
star
36

node-inspect

Code that's now part of node, previously `node debug` for `node --inspect`
JavaScript
340
star
37

node-report

Delivers a human-readable diagnostic summary, written to file.
C++
326
star
38

single-executable

This team aims to advance the state of the art in packaging Node.js applications as single standalone executables (SEAs) on all supported operating systems.
306
star
39

quic

This repository is no longer active.
JavaScript
301
star
40

github-bot

@nodejs-github-bot's heart and soul
JavaScript
267
star
41

community-committee

The Node.js Community Committee (aka CommComm)
263
star
42

nodejs-ko

node.js 한국 커뮤니티
Stylus
263
star
43

amaro

Node.js TypeScript wrapper
JavaScript
261
star
44

node-core-utils

CLI tools for Node.js Core collaborators
JavaScript
253
star
45

unofficial-builds

Unofficial binaries for Node.js
Shell
252
star
46

evangelism

Letting the world know how awesome Node.js is and how to get involved!
242
star
47

abi-stable-node

Repository used by the Node-API team to manage work related to Node-API and node-addon-api
JavaScript
241
star
48

abi-stable-node-addon-examples

Node Add-on Examples with PoC ABI stable API for native modules
C++
237
star
49

changelog-maker

A git log to CHANGELOG.md tool
JavaScript
230
star
50

uvwasi

WASI syscall API built atop libuv
C
228
star
51

cjs-module-lexer

Fast lexer to extract named exports via analysis from CommonJS modules
JavaScript
226
star
52

iojs.org

JavaScript
219
star
53

installer

Electron based installer for Node.js.
JavaScript
194
star
54

getting-started

Getting started in Node.js!
193
star
55

postject

Easily inject arbitrary read-only resources into executable formats (Mach-O, PE, ELF) and use it at runtime.
JavaScript
186
star
56

web-server-frameworks

A place for Node.js Web-Server Framework authors and users to collaborate
182
star
57

repl

REPL rewrite for Node.js ✨🐢🚀✨
JavaScript
178
star
58

tooling

Advancing Node.js as a framework for writing great tools
170
star
59

snap

Node.js snap source and updater
Shell
168
star
60

code-and-learn

A series of workshop sprints for Node.js.
Dockerfile
164
star
61

benchmarking

Node.js Benchmarking Working Group
Shell
161
star
62

admin

Administrative space for policies of the TSC
JavaScript
157
star
63

docker-iojs

Official Docker images from the io.js project
Shell
156
star
64

full-icu-npm

convenience loader for 'small-icu' node builds
JavaScript
152
star
65

i18n

The Node.js Internationalization Working Group – A Community Committee initiative.
150
star
66

roadmap

This repository and working group has been retired.
135
star
67

gyp-next

A fork of the GYP build system for use in the Node.js projects
Python
131
star
68

loaders

ECMAScript Modules Loaders
128
star
69

nodejs-pt

Internacionalização & tradução para português referente ao site nodejs.org
108
star
70

dev-policy

node-foundation dev policy **draft**
108
star
71

promises

Promises Working Group Repository
107
star
72

nodejs-zh-TW

Node.js zh-TW
CSS
107
star
73

NG

Next Generation JavaScript IO Platform
103
star
74

nodejs-ja

Node.js 日本語ローカリゼーション
101
star
75

nodejs.org-archive

[DEPRECATED] Website repository for the Node.js project
Nginx
101
star
76

website-redesign

Facilitating the redesign of the nodejs.org website
99
star
77

node-core-test

Node 18's node:test, as an npm package
JavaScript
95
star
78

worker

Figuring out native (Web?)Worker support for Node
JavaScript
87
star
79

post-mortem

This WG is in the process of being folded into the Diagnostics WG.
85
star
80

typescript

TypeScript support in Node.js core
83
star
81

inclusivity

Improving inclusivity in the node community
80
star
82

CTC

Node.js Core Technical Committee & Collaborators
80
star
83

nodejs-ru

Перевод io.js на русский язык
JavaScript
79
star
84

ecmascript-modules

A fork of Node.js to hash out ideas related to ESModules
JavaScript
73
star
85

docs

A place for documentation. (this repository is inactive)
71
star
86

webcrypto

This repository has been archived. The WebCrypto API has been implemented in recent versions of Node.js and does not require additional packages.
JavaScript
69
star
87

import-in-the-middle

Like `require-in-the-middle`, but for ESM import
JavaScript
67
star
88

automation

Better automation for the Node.js project
66
star
89

api

API WG
61
star
90

email

MX server management for iojs.org (and eventually nodejs.org)
JavaScript
60
star
91

user-feedback

Node.js User Feedback Initiative
56
star
92

loaders-test

Examples demonstrating the Node.js ECMAScript Modules Loaders API
JavaScript
54
star
93

core-validate-commit

Validate commit messages for Node.js core
JavaScript
52
star
94

board

The Node Foundation Board of Directors
JavaScript
52
star
95

branch-diff

A tool to list print the commits on one git branch that are not on another using loose comparison
JavaScript
52
star
96

logos

Logo ideas
51
star
97

promise-use-cases

Short lived repository in order to discuss Node.js promise use cases in Collaborator Summit Berlin 2018
JavaScript
50
star
98

open-standards

Node.js Open Standards Team
43
star
99

version-management

Discussion Group for Version Management
42
star
100

hardware

Hardware Working Group
42
star