• This repository has been archived on 06/Nov/2022
  • Stars
    star
    6,223
  • Rank 6,109 (Top 0.2 %)
  • Language
    C
  • License
    MIT License
  • Created about 15 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

http request/response parser for c

HTTP Parser

http-parser is not actively maintained. New projects and projects looking to migrate should consider llhttp.

Build Status

This is a parser for HTTP messages written in C. It parses both requests and responses. The parser is designed to be used in performance HTTP applications. It does not make any syscalls nor allocations, it does not buffer data, it can be interrupted at anytime. Depending on your architecture, it only requires about 40 bytes of data per message stream (in a web server that is per connection).

Features:

  • No dependencies
  • Handles persistent streams (keep-alive).
  • Decodes chunked encoding.
  • Upgrade support
  • Defends against buffer overflow attacks.

The parser extracts the following information from HTTP messages:

  • Header fields and values
  • Content-Length
  • Request method
  • Response status code
  • Transfer-Encoding
  • HTTP version
  • Request URL
  • Message body

Usage

One http_parser object is used per TCP connection. Initialize the struct using http_parser_init() and set the callbacks. That might look something like this for a request parser:

http_parser_settings settings;
settings.on_url = my_url_callback;
settings.on_header_field = my_header_field_callback;
/* ... */

http_parser *parser = malloc(sizeof(http_parser));
http_parser_init(parser, HTTP_REQUEST);
parser->data = my_socket;

When data is received on the socket execute the parser and check for errors.

size_t len = 80*1024, nparsed;
char buf[len];
ssize_t recved;

recved = recv(fd, buf, len, 0);

if (recved < 0) {
  /* Handle error. */
}

/* Start up / continue the parser.
 * Note we pass recved==0 to signal that EOF has been received.
 */
nparsed = http_parser_execute(parser, &settings, buf, recved);

if (parser->upgrade) {
  /* handle new protocol */
} else if (nparsed != recved) {
  /* Handle error. Usually just close the connection. */
}

http_parser needs to know where the end of the stream is. For example, sometimes servers send responses without Content-Length and expect the client to consume input (for the body) until EOF. To tell http_parser about EOF, give 0 as the fourth parameter to http_parser_execute(). Callbacks and errors can still be encountered during an EOF, so one must still be prepared to receive them.

Scalar valued message information such as status_code, method, and the HTTP version are stored in the parser structure. This data is only temporally stored in http_parser and gets reset on each new message. If this information is needed later, copy it out of the structure during the headers_complete callback.

The parser decodes the transfer-encoding for both requests and responses transparently. That is, a chunked encoding is decoded before being sent to the on_body callback.

The Special Problem of Upgrade

http_parser supports upgrading the connection to a different protocol. An increasingly common example of this is the WebSocket protocol which sends a request like

    GET /demo HTTP/1.1
    Upgrade: WebSocket
    Connection: Upgrade
    Host: example.com
    Origin: http://example.com
    WebSocket-Protocol: sample

followed by non-HTTP data.

(See RFC6455 for more information the WebSocket protocol.)

To support this, the parser will treat this as a normal HTTP message without a body, issuing both on_headers_complete and on_message_complete callbacks. However http_parser_execute() will stop parsing at the end of the headers and return.

The user is expected to check if parser->upgrade has been set to 1 after http_parser_execute() returns. Non-HTTP data begins at the buffer supplied offset by the return value of http_parser_execute().

Callbacks

During the http_parser_execute() call, the callbacks set in http_parser_settings will be executed. The parser maintains state and never looks behind, so buffering the data is not necessary. If you need to save certain data for later usage, you can do that from the callbacks.

There are two types of callbacks:

  • notification typedef int (*http_cb) (http_parser*); Callbacks: on_message_begin, on_headers_complete, on_message_complete.
  • data typedef int (*http_data_cb) (http_parser*, const char *at, size_t length); Callbacks: (requests only) on_url, (common) on_header_field, on_header_value, on_body;

Callbacks must return 0 on success. Returning a non-zero value indicates error to the parser, making it exit immediately.

For cases where it is necessary to pass local information to/from a callback, the http_parser object's data field can be used. An example of such a case is when using threads to handle a socket connection, parse a request, and then give a response over that socket. By instantiation of a thread-local struct containing relevant data (e.g. accepted socket, allocated memory for callbacks to write into, etc), a parser's callbacks are able to communicate data between the scope of the thread and the scope of the callback in a threadsafe manner. This allows http_parser to be used in multi-threaded contexts.

Example:

 typedef struct {
  socket_t sock;
  void* buffer;
  int buf_len;
 } custom_data_t;


int my_url_callback(http_parser* parser, const char *at, size_t length) {
  /* access to thread local custom_data_t struct.
  Use this access save parsed data for later use into thread local
  buffer, or communicate over socket
  */
  parser->data;
  ...
  return 0;
}

...

void http_parser_thread(socket_t sock) {
 int nparsed = 0;
 /* allocate memory for user data */
 custom_data_t *my_data = malloc(sizeof(custom_data_t));

 /* some information for use by callbacks.
 * achieves thread -> callback information flow */
 my_data->sock = sock;

 /* instantiate a thread-local parser */
 http_parser *parser = malloc(sizeof(http_parser));
 http_parser_init(parser, HTTP_REQUEST); /* initialise parser */
 /* this custom data reference is accessible through the reference to the
 parser supplied to callback functions */
 parser->data = my_data;

 http_parser_settings settings; /* set up callbacks */
 settings.on_url = my_url_callback;

 /* execute parser */
 nparsed = http_parser_execute(parser, &settings, buf, recved);

 ...
 /* parsed information copied from callback.
 can now perform action on data copied into thread-local memory from callbacks.
 achieves callback -> thread information flow */
 my_data->buffer;
 ...
}

In case you parse HTTP message in chunks (i.e. read() request line from socket, parse, read half headers, parse, etc) your data callbacks may be called more than once. http_parser guarantees that data pointer is only valid for the lifetime of callback. You can also read() into a heap allocated buffer to avoid copying memory around if this fits your application.

Reading headers may be a tricky task if you read/parse headers partially. Basically, you need to remember whether last header callback was field or value and apply the following logic:

(on_header_field and on_header_value shortened to on_h_*)
 ------------------------ ------------ --------------------------------------------
| State (prev. callback) | Callback   | Description/action                         |
 ------------------------ ------------ --------------------------------------------
| nothing (first call)   | on_h_field | Allocate new buffer and copy callback data |
|                        |            | into it                                    |
 ------------------------ ------------ --------------------------------------------
| value                  | on_h_field | New header started.                        |
|                        |            | Copy current name,value buffers to headers |
|                        |            | list and allocate new buffer for new name  |
 ------------------------ ------------ --------------------------------------------
| field                  | on_h_field | Previous name continues. Reallocate name   |
|                        |            | buffer and append callback data to it      |
 ------------------------ ------------ --------------------------------------------
| field                  | on_h_value | Value for current header started. Allocate |
|                        |            | new buffer and copy callback data to it    |
 ------------------------ ------------ --------------------------------------------
| value                  | on_h_value | Value continues. Reallocate value buffer   |
|                        |            | and append callback data to it             |
 ------------------------ ------------ --------------------------------------------

Parsing URLs

A simplistic zero-copy URL parser is provided as http_parser_parse_url(). Users of this library may wish to use it to parse URLs constructed from consecutive on_url callbacks.

See examples of reading in headers:

More Repositories

1

node

Node.js JavaScript runtime ✨🐢🚀✨
JavaScript
97,973
star
2

node-v0.x-archive

Moved to https://github.com/nodejs/node
34,533
star
3

node-gyp

Node.js native addon build tool
Python
9,275
star
4

docker-node

Official Docker Image for Node.js 🐳 🐢 🚀
Dockerfile
7,872
star
5

undici

An HTTP/1.1 client, written from scratch for Node.js
JavaScript
5,741
star
6

Release

Node.js Release Working Group
3,803
star
7

nan

Native Abstractions for Node.js
C++
3,245
star
8

nodejs.org

The Node.js® Website
TypeScript
3,215
star
9

node-addon-examples

Node.js C++ addon examples from http://nodejs.org/docs/latest/api/addons.html
C++
2,332
star
10

nodejs.dev

A redesign of Nodejs.org built using Gatsby.js with React.js, TypeScript, and Remark.
TypeScript
2,297
star
11

corepack

Zero-runtime-dependency package acting as bridge between Node projects and their package managers
TypeScript
2,150
star
12

node-addon-api

Module for using Node-API from C++
C++
1,999
star
13

node-chakracore

Node.js on ChakraCore ✨🐢🚀✨
JavaScript
1,919
star
14

node-convergence-archive

Archive for node/io.js convergence work pre-3.0.0
JavaScript
1,837
star
15

llhttp

Port of http_parser to llparse
TypeScript
1,552
star
16

help

✨ Need help with Node.js? File an Issue here. 🚀
1,383
star
17

llnode

An lldb plugin for Node.js and V8, which enables inspection of JavaScript states for insights into Node.js processes and their core dumps.
C++
1,140
star
18

readable-stream

Node-core streams for userland
JavaScript
1,003
star
19

examples

A repository of runnable Node.js examples that go beyond "hello, world!"
JavaScript
589
star
20

mentorship

Node.js Mentorship Program Initiative
587
star
21

llparse

Generating parsers in LLVM IR
TypeScript
567
star
22

TSC

The Node.js Technical Steering Committee
JavaScript
557
star
23

citgm

Canary in the Gold Mine
JavaScript
539
star
24

http2

Working on an HTTP/2 implementation for Node.js Core
JavaScript
520
star
25

diagnostics

Node.js Diagnostics Working Group
513
star
26

security-wg

Node.js Ecosystem Security Working Group
JavaScript
482
star
27

build

Better build and test infra for Node.
Shell
469
star
28

node-eps

Node.js Enhancement Proposals for discussion on future API additions/changes to Node core
446
star
29

next-10

Repository for discussion on strategic directions for next 10 years of Node.js
441
star
30

education

A place to discover and contribute to education initiatives in Node.js
418
star
31

modules

Node.js Modules Team
413
star
32

package-maintenance

Repository for work for discussion of helping with maintenance of key packages in the ecosystem.
403
star
33

nodejs-zh-CN

node.js 中文化 & 中文社区
SCSS
395
star
34

node-v8

Experimental Node.js mirror on V8 lkgr ✨🐢🚀✨
Shell
392
star
35

performance

Node.js team focusing on performance
363
star
36

node-inspect

Code that's now part of node, previously `node debug` for `node --inspect`
JavaScript
339
star
37

node-report

Delivers a human-readable diagnostic summary, written to file.
C++
327
star
38

quic

This repository is no longer active.
JavaScript
298
star
39

nodejs-ko

node.js 한국 커뮤니티
Stylus
262
star
40

community-committee

The Node.js Community Committee (aka CommComm)
259
star
41

github-bot

@nodejs-github-bot's heart and soul
JavaScript
259
star
42

evangelism

Letting the world know how awesome Node.js is and how to get involved!
242
star
43

abi-stable-node

Repository used by the Node-API team to manage work related to Node-API and node-addon-api
JavaScript
239
star
44

abi-stable-node-addon-examples

Node Add-on Examples with PoC ABI stable API for native modules
C++
238
star
45

changelog-maker

A git log to CHANGELOG.md tool
JavaScript
225
star
46

single-executable

This team aims to advance the state of the art in packaging Node.js applications as single standalone executables (SEAs) on all supported operating systems.
225
star
47

node-core-utils

CLI tools for Node.js Core collaborators
JavaScript
220
star
48

iojs.org

JavaScript
219
star
49

uvwasi

WASI syscall API built atop libuv
C
217
star
50

cjs-module-lexer

Fast lexer to extract named exports via analysis from CommonJS modules
JavaScript
210
star
51

installer

Electron based installer for Node.js.
JavaScript
194
star
52

unofficial-builds

Unofficial binaries for Node.js
Shell
192
star
53

getting-started

Getting started in Node.js!
184
star
54

repl

REPL rewrite for Node.js ✨🐢🚀✨
JavaScript
170
star
55

web-server-frameworks

A place for Node.js Web-Server Framework authors and users to collaborate
170
star
56

tooling

Advancing Node.js as a framework for writing great tools
164
star
57

code-and-learn

A series of workshop sprints for Node.js.
Dockerfile
163
star
58

benchmarking

Node.js Benchmarking Working Group
Shell
161
star
59

snap

Node.js snap source and updater
Shell
159
star
60

docker-iojs

Official Docker images from the io.js project
Shell
156
star
61

i18n

The Node.js Internationalization Working Group – A Community Committee initiative.
150
star
62

full-icu-npm

convenience loader for 'small-icu' node builds
JavaScript
150
star
63

postject

Easily inject arbitrary read-only resources into executable formats (Mach-O, PE, ELF) and use it at runtime.
JavaScript
148
star
64

admin

Administrative space for policies of the TSC
146
star
65

roadmap

This repository and working group has been retired.
135
star
66

gyp-next

A fork of the GYP build system for use in the Node.js projects
Python
113
star
67

nodejs-pt

Internacionalização & tradução para português referente ao site nodejs.org
108
star
68

dev-policy

node-foundation dev policy **draft**
108
star
69

promises

Promises Working Group Repository
107
star
70

loaders

ECMAScript Modules Loaders
107
star
71

nodejs-zh-TW

Node.js zh-TW
CSS
107
star
72

NG

Next Generation JavaScript IO Platform
103
star
73

nodejs-ja

Node.js 日本語ローカリゼーション
101
star
74

nodejs.org-archive

[DEPRECATED] Website repository for the Node.js project
Nginx
101
star
75

website-redesign

Facilitating the redesign of the nodejs.org website
99
star
76

node-core-test

Node 18's node:test, as an npm package
JavaScript
90
star
77

worker

Figuring out native (Web?)Worker support for Node
JavaScript
87
star
78

post-mortem

This WG is in the process of being folded into the Diagnostics WG.
85
star
79

inclusivity

Improving inclusivity in the node community
80
star
80

CTC

Node.js Core Technical Committee & Collaborators
80
star
81

nodejs-ru

Перевод io.js на русский язык
JavaScript
79
star
82

ecmascript-modules

A fork of Node.js to hash out ideas related to ESModules
JavaScript
72
star
83

docs

A place for documentation. (this repository is inactive)
71
star
84

webcrypto

This repository has been archived. The WebCrypto API has been implemented in recent versions of Node.js and does not require additional packages.
JavaScript
68
star
85

automation

Better automation for the Node.js project
64
star
86

api

API WG
61
star
87

email

MX server management for iojs.org (and eventually nodejs.org)
JavaScript
59
star
88

user-feedback

Node.js User Feedback Initiative
56
star
89

board

The Node Foundation Board of Directors
JavaScript
52
star
90

logos

Logo ideas
51
star
91

promise-use-cases

Short lived repository in order to discuss Node.js promise use cases in Collaborator Summit Berlin 2018
JavaScript
50
star
92

branch-diff

A tool to list print the commits on one git branch that are not on another using loose comparison
JavaScript
49
star
93

loaders-test

Examples demonstrating the Node.js ECMAScript Modules Loaders API
JavaScript
48
star
94

core-validate-commit

Validate commit messages for Node.js core
JavaScript
48
star
95

open-standards

Node.js Open Standards Team
43
star
96

hardware

Hardware Working Group
42
star
97

security-advisories

Security advisories for Node.js and the JavaScript ecosystem.
JavaScript
41
star
98

version-management

Discussion Group for Version Management
40
star
99

whatwg-stream

WIP Official support for WHATWG Stream in Node.js
37
star
100

vm

Repository for Discussion / Working on Multi-VM Related Issues and Ideas for Node.js
35
star