• Stars
    star
    255
  • Rank 154,236 (Top 4 %)
  • Language
    C
  • Created over 11 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Streaming regular expression replacement in response bodies

Name

ngx_replace_filter - Streaming regular expression replacement in response bodies.

This module is not distributed with the Nginx source. See the installation instructions.

Table of Contents

Status

This module is already quite usable though still at the early phase of development and is considered experimental.

Synopsis

    location /t {
        default_type text/html;
        echo abc;
        replace_filter 'ab|abc' X;
    }

    location / {
        # proxy_pass/fastcgi_pass/...

        # caseless global substitution:
        replace_filter '\d+' 'blah blah' 'ig';
        replace_filter_types text/plain text/css;
    }

    location /a {
        # proxy_pass/fastcgi_pass/root/...

        # remove line-leading spaces and line-trailing spaces,
        # as well as blank lines:
        replace_filter '^\s+|\s+$' '' g;
    }

    location /b {
        # proxy_pass/fastcgi_pass/root/...

        # only remove line-leading spaces and line-trailing spaces:
        replace_filter '^[ \f\t]+|[ \f\t]+$' '' g;
    }

    location ~ '\.cpp$' {
        # proxy_pass/fastcgi_pass/root/...

        replace_filter_types text/plain;

        # skip C/C++ string literals:
        replace_filter "'(?:\\\\[^\n]|[^'\n])*'" $& g;
        replace_filter '"(?:\\\\[^\n]|[^"\n])*"' $& g;

        # remove all those ugly C/C++ comments:
        replace_filter '/\*.*?\*/|//[^\n]*' '' g;
    }

Description

This Nginx output filter module tries to do regular expression substitions in a non-buffered manner wherever possible.

This module does not use traditional backtracking regular expression engines like PCRE, rather, it uses the new sregex library implemented by the author himself, which was designed with streaming processing in mind from the very beginning:

A good common subset of Perl 5 regular expressions is supported by sregex. For the complete feature list, check out sregex's documentation:

https://github.com/agentzh/sregex#syntax-supported

Response body data is only buffered when absolutely necessary, like facing an incomplete capture that belongs to a possible match near the data chunk boundaries.

Back to TOC

Directives

Back to TOC

replace_filter

syntax: replace_filter <regex> <replace>

syntax: replace_filter <regex> <replace> <options>

default: no

context: http, server, location, location if

phase: output body filter

Specifies the regex pattern and text to be replaced, with an optional regex flags.

By default, the filter topped matching after the first match is found. This behavior can be changed by specifying the g regex option.

The following regex options are supported:

  • g

    for global search and substituion (default off)

  • i

    for case-insensitive matching (default off)

Multiple options can be combined in a single string argument, for example:

    replace_filter hello hiya ig;

Nginx variables can be interpolated into the text to be replaced, for example:

    replace_filter \w+ "[$foo,$bar]";

If you want to use the literal dollar sign character ($), use the $$ sequence for that, for instance:

    replace_filter \w "$$";

Use of submatch capturing variables like $&, $1, $2, and etc are also supported, for example,

    replace_filter [bc]|d [$&-$1-$2] g;

The semantics of the submatch capturing variables is exactly the same as in the Perl 5 language.

Multiple replace_filter directives in the same scope is also supported. All the patterns will be applied at the same time as in a tokenizer. We will not use the longest token match semantics, but rather, patterns will be prioritized according to their order in the configure file.

Here is an example for removing all the C/C++ comments from a C/C++ source code file:

    replace_filter "'(?:\\\\[^\n]|[^'\n])*'" $& g;
    replace_filter '"(?:\\\\[^\n]|[^"\n])*"' $& g;
    replace_filter '/\*.*?\*/|//[^\n]*' '' g;

When the Content-Encoding response header is not empty (like gzip), the response body will always remain intact. So usually you want to disable the gzip compression in your backend servers' responses by adding the following line to your nginx.conf if you are the ngx_proxy module:

    proxy_set_header Accept-Encoding '';

Your responses can still be gzip compressed on the Nginx server level though.

Back to TOC

replace_filter_types

syntax: replace_filter_types <mime-type> ...

default: replace_filter_types text/html

context: http, server, location, location if

phase: output body filter

Specify one or more MIME types (in the Content-Type response header) to be processed.

By default, only text/html typed responses are processed.

Back to TOC

replace_filter_max_buffered_size

syntax: replace_filter_max_buffered_size <size>

default: replace_filter_max_buffered_size 8k

context: http, server, location, location if

phase: output body filter

Limits the total size of the data buffered by the module at runtime. Default to 8k.

When the limit is reached, replace_filter will immediately stop processing and leave all the remaining response body data intact.

Back to TOC

replace_filter_last_modified

syntax: replace_filter_last_modifiled keep | clear

default: replace_filter_last_modified clear

context: http, server, location, location if

phase: output body filter

Controls how to deal with the existing Last-Modified response header.

By default, this module will clear the Last-Modified response header if there is any. You can specify

    replace_filter_last_modified keep;

to always keep the original Last-Modified response header.

Back to TOC

replace_filter_skip

syntax: replace_filter_skip $var

default: no

context: http, server, location, location if

phase: output header filter

This directive controls whether to skip all the replace_filter rules on a per-request basis.

Both constant values or strings containing NGINX variables are supported.

When the value is evaluated to an empty value ("") or the value "0" in the request output header phase, no replace_filter rules will be skipped for the current request. Otherwise all the replace_filter rules will be skipped for the current request.

Below is a trivial example for this:

set $skip '';
location /t {
    content_by_lua '
        ngx.var.skip = 1
        ngx.say("abcabd")
    ';
    replace_filter_skip $skip;
    replace_filter abcabd X;
}

Back to TOC

Installation

You need to install the sregex library first:

https://github.com/agentzh/sregex

And then rebuild your Nginx like this:

    ./configure --add-module=/path/to/replace-filter-nginx-module

If sregex is not installed to the default prefix (i.e., /usr/local), then you should specify the locations of your sregex installation via the SREGEX_INC and SREGEX_LIB environments before running the ./configure script, as in

    export SREGEX_INC=/opt/sregex/include
    export SREGEX_LIB=/opt/sregex/lib

assuming that your sregex is installed to the prefix /opt/sregex.

Starting from NGINX 1.9.11, you can also compile this module as a dynamic module, by using the --add-dynamic-module=PATH option instead of --add-module=PATH on the ./configure command line above. And then you can explicitly load the module in your nginx.conf via the load_module directive, for example,

load_module /path/to/modules/ngx_http_replace_filter_module.so;

Back to TOC

Trouble Shooting

  • If you are seeing the error "error while loading shared libraries: libsregex.so.0: cannot open shared object file: No such file or directory" while starting nginx, then it means that the installation path of your libsregex library is not in your system's default library search path. You can solve this issue by passing the option --with-ld-opt='-Wl,-rpath,/usr/local/lib' to nginx's ./configure command. Alternatively, you can just add the path of your libsregex.so.0 to the LD_LIBRARY_PATH environment value before starting your nginx server.

Back to TOC

TODO

  • optimize the special case for verbatim substitutions, i.e., replace_filter <regex> $&;.
  • implement the replace_filter_skip $var directive to control whether to enable the filter on the fly.
  • reduce the amount of data that has to be buffered for when an partial match is already found.
  • recycle the memory blocks used to buffer the pending capture data and "complex values" for replacement.
  • allow use of inlined Lua code as the replacement argument of the replace_filter directive to generate the text to be replaced on-the-fly.

Back to TOC

Community

Back to TOC

English Mailing List

The openresty-en mailing list is for English speakers.

Back to TOC

Chinese Mailing List

The openresty mailing list is for Chinese speakers.

Back to TOC

Bugs and Patches

Please submit bug reports, wishlists, or patches by

  1. creating a ticket on the GitHub Issue Tracker,
  2. or posting to the OpenResty community.

Back to TOC

Author

Yichun "agentzh" Zhang (η« δΊ¦ζ˜₯) [email protected], OpenResty Inc.

Back to TOC

Copyright and License

This module is licensed under the BSD license.

Copyright (C) 2012-2017, by Yichun "agentzh" Zhang (η« δΊ¦ζ˜₯), OpenResty Inc.

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Back to TOC

See Also

More Repositories

1

openresty

High Performance Web Platform Based on Nginx and LuaJIT
C
12,021
star
2

lua-nginx-module

Embed the Power of Lua into NGINX HTTP servers
C
11,049
star
3

nginx-tutorials

Nginx Tutorials
Perl
2,851
star
4

lua-resty-redis

Lua redis client driver for the ngx_lua based on the cosocket API
Lua
1,863
star
5

openresty-systemtap-toolkit

Real-time analysis and diagnostics tools for OpenResty (including NGINX, LuaJIT, ngx_lua, and more) based on SystemTap
Perl
1,640
star
6

headers-more-nginx-module

Set, add, and clear arbitrary output headers in NGINX http servers
C
1,592
star
7

openresty.org

Code and data for the openresty.org site
HTML
1,254
star
8

luajit2

OpenResty's Branch of LuaJIT 2
C
1,152
star
9

echo-nginx-module

An Nginx module for bringing the power of "echo", "sleep", "time" and more to Nginx's config file
C
1,139
star
10

docker-openresty

Docker tooling for OpenResty
Dockerfile
915
star
11

redis2-nginx-module

Nginx upstream module for the Redis 2.0 protocol
C
892
star
12

lua-resty-limit-traffic

Lua library for limiting and controlling traffic in OpenResty/ngx_lua
Lua
794
star
13

lua-resty-core

New FFI-based API for lua-nginx-module
Lua
775
star
14

stream-lua-nginx-module

Embed the power of Lua into NGINX TCP/UDP servers
C
709
star
15

lua-resty-mysql

Nonblocking Lua MySQL driver library for ngx_lua or OpenResty
Lua
693
star
16

stapxx

Simple macro language extentions to systemtap
Perl
682
star
17

sregex

A non-backtracking NFA/DFA-based Perl-compatible regex engine matching on large data streams
C
614
star
18

lua-resty-upstream-healthcheck

Health Checker for Nginx Upstream Servers in Pure Lua
Lua
506
star
19

lua-upstream-nginx-module

Nginx C module to expose Lua API to ngx_lua for Nginx upstreams
C
497
star
20

lua-resty-websocket

WebSocket support for the ngx_lua module (and OpenResty)
Lua
492
star
21

srcache-nginx-module

Transparent subrequest-based caching layout for arbitrary nginx locations.
C
469
star
22

opm

OpenResty Package Manager
Lua
454
star
23

lua-resty-lrucache

Lua-land LRU Cache based on LuaJIT FFI
Lua
432
star
24

test-nginx

Data-driven test scaffold for Nginx C module and OpenResty Lua library development
Perl
430
star
25

lua-resty-string

String utilities and common hash functions for ngx_lua and LuaJIT
Lua
423
star
26

lua-resty-upload

Streaming reader and parser for http file uploading based on ngx_lua cosocket
Lua
392
star
27

set-misc-nginx-module

Various set_xxx directives added to nginx's rewrite module (md5/sha1, sql/json quoting, and many more)
C
384
star
28

drizzle-nginx-module

an nginx upstream module that talks to mysql and drizzle by libdrizzle
C
335
star
29

openresty-gdb-utils

GDB Utilities for OpenResty (including Nginx, ngx_lua, LuaJIT, and more)
Python
328
star
30

lua-resty-dns

DNS resolver for the nginx lua module
Lua
319
star
31

lua-resty-balancer

A generic consistent hash implementation for OpenResty/Lua
Lua
319
star
32

programming-openresty

Programming OpenResty Book
Perl
318
star
33

lua-resty-lock

Simple nonblocking lock API for ngx_lua based on shared memory dictionaries
Lua
302
star
34

openresty-devel-utils

Utilities for nginx module development
Perl
263
star
35

resty-cli

Fancy command-line utilities for OpenResty
Perl
262
star
36

lua-resty-memcached

Lua memcached client driver for the ngx_lua based on the cosocket API
Lua
209
star
37

memc-nginx-module

An extended version of the standard memcached module that supports set, add, delete, and many more memcached commands.
C
208
star
38

encrypted-session-nginx-module

encrypt and decrypt nginx variable values
C
195
star
39

openresty-packaging

Official OpenResty packaging source and scripts for various Linux distributions and other systems
Makefile
172
star
40

rds-json-nginx-module

An nginx output filter that formats Resty DBD Streams generated by ngx_drizzle and others to JSON
C
154
star
41

xss-nginx-module

Native support for cross-site scripting (XSS) in an nginx
C
147
star
42

mockeagain

Mocking ideally slow network that only allows reading and/or writing one byte at a time
C
128
star
43

lua-resty-shell

Lua module for nonblocking system shell command executions
Perl
120
star
44

lua-tablepool

Lua table recycling pools for LuaJIT
Perl
110
star
45

lua-redis-parser

Lua module for parsing raw redis responses
C
92
star
46

openresty-survey

OpenResty Web App for OpenResty User Survey
HTML
90
star
47

lua-ssl-nginx-module

NGINX C module that extends ngx_http_lua_module for enhanced SSL/TLS capabilities
Lua
86
star
48

opsboy

A rule-based sysadmin tool that helps setting up complex environment for blank machines
Perl
83
star
49

no-pool-nginx

replace nginx's pool mechanism with plain malloc & free to help tools like valgrind
Shell
77
star
50

stream-echo-nginx-module

TCP/stream echo module for NGINX (a port of ngx_http_echo_module)
C
70
star
51

meta-lua-nginx-module

Meta Lua Nginx Module supporting both Http Lua Module and Stream Lua Module
C
65
star
52

array-var-nginx-module

Add support for array-typed variables to nginx config files
C
64
star
53

lemplate

OpenResty/Lua template framework implementing Perl's TT2 templating language
Perl
53
star
54

openresty-con

JavaScript
46
star
55

nginx-dtrace

An nginx fork that adds dtrace USDT probes
C
44
star
56

lua-resty-memcached-shdict

Powerful memcached client with a shdict caching layer and many other features
Lua
34
star
57

lua-resty-shdict-simple

Simple applicaton-oriented interface to the OpenResty shared dictionary API
Perl
32
star
58

lua-resty-signal

Lua library for killing or sending signals to UNIX processes
Perl
31
star
59

luajit2-test-suite

OpenResty's LuaJIT test suite based on Mike Pall's LuaJIT tests
Lua
29
star
60

ngx_postgres

OpenResty's fork of FRiCKLE/ngx_postgres
C
26
star
61

rds-csv-nginx-module

Nginx output filter module to convert Resty-DBD-Streams (RDS) to Comma-Separated Values (CSV)
C
22
star
62

showman-samples

Sample screenplay files for generating our public video tutorials using OpenResty Showman
20
star
63

lua-rds-parser

Resty DBD Stream (RDS) parser for Lua written in C
C
19
star
64

redis-nginx-module

8
star
65

AB-test-http

test http requests between two systems.
Perl
5
star
66

transparency

2
star