• Stars
    star
    254
  • Rank 159,382 (Top 4 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created about 11 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Parses log lines from an apache log

Parses log lines from an apache log file in (almost) any format possible

Build Status

Installation

pip install apache-log-parser

Usage

import apache_log_parser
line_parser = apache_log_parser.make_parser("%v %h %l %u %t \"%r\" %>s %b")

This creates & returns a function, line_parser, which accepts a line from an apache log file in that format, and will return the parsed values in a dictionary.

Example

>>> import apache_log_parser
>>> line_parser = apache_log_parser.make_parser("%h <<%P>> %t %Dus \"%r\" %>s %b  \"%{Referer}i\" \"%{User-Agent}i\" %l %u")
>>> log_line_data = line_parser('127.0.0.1 <<6113>> [16/Aug/2013:15:45:34 +0000] 1966093us "GET / HTTP/1.1" 200 3478  "https://example.com/" "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18)" - -')
>>> from pprint import pprint
>>> pprint(log_line_data)
{'pid': '6113',
 'remote_host': '127.0.0.1',
 'remote_logname': '-',
 'remote_user': '',
 'request_first_line': 'GET / HTTP/1.1',
 'request_header_referer': 'https://example.com/',
 'request_header_user_agent': 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18)',
 'request_header_user_agent__browser__family': 'Other',
 'request_header_user_agent__browser__version_string': '',
 'request_header_user_agent__is_mobile': False,
 'request_header_user_agent__os__family': 'Linux',
 'request_header_user_agent__os__version_string': '',
 'request_http_ver': '1.1',
 'request_method': 'GET',
 'request_url': '/',
 'response_bytes_clf': '3478',
 'status': '200',
 'time_received': '[16/Aug/2013:15:45:34 +0000]',
 'time_received_datetimeobj': datetime.datetime(2013, 8, 16, 15, 45, 34),
 'time_received_isoformat': '2013-08-16T15:45:34',
 'time_received_tz_datetimeobj': datetime.datetime(2013, 8, 16, 15, 45, 34, tzinfo='0000'),
 'time_received_tz_isoformat': '2013-08-16T15:45:34+00:00',
 'time_received_utc_datetimeobj': datetime.datetime(2013, 8, 16, 15, 45, 34, tzinfo='0000'),
 'time_received_utc_isoformat': '2013-08-16T15:45:34+00:00',
 'time_us': '1966093'}

There is a at least one key/value in the returned dictionary for each apache log placeholder. Some have more than one (e.g. all the time_received*).

The version numbers follow Semantic Versioning.

Supported values

    '%a'  #	Remote IP-address
    '%A'  #	Local IP-address
    '%B'  #	Size of response in bytes, excluding HTTP headers.
    '%b'  #	Size of response in bytes, excluding HTTP headers. In CLF format, i.e. a '-' rather than a 0 when no bytes are sent.
    '%D'  #	The time taken to serve the request, in microseconds.
    '%f'  #	Filename
    '%h'  #	Remote host
    '%H'  #	The request protocol
    '%k'  #	Number of keepalive requests handled on this connection. Interesting if KeepAlive is being used, so that, for example, a '1' means the first keepalive request after the initial one, '2' the second, etc...; otherwise this is always 0 (indicating the initial request). Available in versions 2.2.11 and later.
    '%l'  #	Remote logname (from identd, if supplied). This will return a dash unless mod_ident is present and IdentityCheck is set On.
    '%m'  #	The request method
    '%p'  #	The canonical port of the server serving the request
    '%P'  #	The process ID of the child that serviced the request.
    '%q'  #	The query string (prepended with a ? if a query string exists, otherwise an empty string)
    '%r'  #	First line of request
    '%R'  #	The handler generating the response (if any).
    '%s'  #	Status. For requests that got internally redirected, this is the status of the *original* request --- %>s for the last.
    '%t'  #	Time the request was received (standard english format)
    '%T'  #	The time taken to serve the request, in seconds.
    '%u'  #	Remote user (from auth; may be bogus if return status (%s) is 401)
    '%U'  #	The URL path requested, not including any query string.
    '%v'  #	The canonical ServerName of the server serving the request.
    '%V'  #	The server name according to the UseCanonicalName setting.
    '%X'  #	Connection status when response is completed:
              # X =	connection aborted before the response completed.
              # + =	connection may be kept alive after the response is sent.
              # - =	connection will be closed after the response is sent.
              # (This directive was %c in late versions of Apache 1.3, but this conflicted with the historical ssl %{var}c syntax.)
    '%I'  #	Bytes received, including request and headers, cannot be zero. You need to enable mod_logio to use this.
    '%O'  #	Bytes sent, including headers, cannot be zero. You need to enable mod_logio to use this.
    
    '%\{User-Agent\}i'  # Special case of below, for matching just user agent
    '%\{[^\}]+?\}i'  #	The contents of Foobar: header line(s) in the request sent to the server. Changes made by other modules (e.g. mod_headers) affect this. If you're interested in what the request header was prior to when most modules would have modified it, use mod_setenvif to copy the header into an internal environment variable and log that value with the %\{VARNAME}e described above.
    
    '%\{[^\}]+?\}C'  #	The contents of cookie Foobar in the request sent to the server. Only version 0 cookies are fully supported.
    '%\{[^\}]+?\}e'  #	The contents of the environment variable FOOBAR
    '%\{[^\}]+?\}n'  #	The contents of note Foobar from another module.
    '%\{[^\}]+?\}o'  #	The contents of Foobar: header line(s) in the reply.
    '%\{[^\}]+?\}p'  #	The canonical port of the server serving the request or the server's actual port or the client's actual port. Valid formats are canonical, local, or remote.
    '%\{[^\}]+?\}P'  #	The process ID or thread id of the child that serviced the request. Valid formats are pid, tid, and hextid. hextid requires APR 1.2.0 or higher.
    '%\{[^\}]+?\}t'  #	The time, in the form given by format, which should be in strftime(3) format. (potentially localized)
    '%\{[^\}]+?\}x'  # Extension value, e.g. mod_ssl protocol and cipher

Copyright

This package is Β© 2013-2015 Rory McCann, released under the terms of the GNU GPL v3 (or at your option a later version). If you'd like a different licence, please email [email protected]

Bitdeli Badge

More Repositories

1

django-template-i18n-lint

Lint tool to find non-trans/blocktrans text in django templates
Python
70
star
2

waterwaymap.org

WaterwayMap.org | River Basins from OpenStreetMap
Makefile
55
star
3

rust-cgi

Create CGI programmes in Rust with hyper's http types
Rust
54
star
4

gedcompy

Python library to parse and work with GEDCOM (geneology/family tree) files
Python
33
star
5

django-sql-inspector

Django SQL Inspector
Python
31
star
6

django-dont-vary-on

Library for Django to give you more control over Django's caching, and improving you cache hits and performance
Python
23
star
7

osm-mapping-party-before-after

Generate images of an area in OSM before & after, to show your mapping progress
Shell
22
star
8

python-pylint-i18n

Pylint checkers to find non-i18n strings in a python file, aimed at Django projects
Python
22
star
9

osm-lump-ways

Group OSM ways together based on topology & tags
Rust
21
star
10

osmio

Read & write OSM file formats
Rust
19
star
11

openstreetmap-bendy-roads

An investigation into how bendy the roads are in the world
Python
18
star
12

python-osm

OpenStreetMap library for python
Python
15
star
13

osm-tag-csv-history

Create a CSV file of OSM tag changes
Rust
15
star
14

pgindexrebuild

Production friendly tool to get rid of index bloat in PostgreSQL
Python
14
star
15

SystemAutopsy

Take a quick system snapshot for later debugging
Python
14
star
16

osm-distance-to-nearest

Find the nearest of one thing to other things!
Shell
8
star
17

anglosaxon-rs

CLI XML convertor for big XML files with SAX
Rust
7
star
18

rust-dbf

Read (and soon write) DBF files in pure Rust
Rust
7
star
19

rust-wkb

Read & Write Well Known Binary in Rust
Rust
7
star
20

pirate-l10n

Automatically generate "English (Pirate)" locale translation file from your .po file
Python
6
star
21

osm-summary-heatmap

Code to generate heatmap of things from OpenStreetMap
Makefile
6
star
22

slippy-map-tiles-rs

Utilities for working with Slippy map tile names, found in OpenStreetMap
Rust
6
star
23

tileigi

Generate Vector Tiles from a PostgreSQL database in Rust
Rust
6
star
24

galea

Easy tool to combine videos into one overview video
Python
5
star
25

iter-progress-rs

Report the progress of an iterator as it runs
Rust
5
star
26

osmchangesets2csv

Convert OpenStreetMap changeset dumps to CSV
Rust
4
star
27

pychatscript

A python library & tool for talking to ChatScript servers
Python
4
star
28

camarabuntu

Scripts for the camarabuntu distrobtion
C
4
star
29

podiff

Proper semantic, .po/gettext diff
Python
4
star
30

openstreetmap-rejoin-ways

Join OSM ways together if they form "natural" roads for later geometeric analysis
Python
4
star
31

rust-mapbox-vector-tile

Read & Write Mapbox Vector Tiles (MVT) in Rust
Rust
4
star
32

osm-history-animation

Make animated GIFs showing when OSM is eduted
Rust
4
star
33

apertium-po-l10n

Using Apertium to automatically translate gettext/.po files
Python
4
star
34

sheatmap

Generate heatmap image data from input points. Usable easily with gdal to make heatmap geotiffs from points
Rust
4
star
35

uniwhat

Reads standard input, and printing out the unicode characters.
Rust
3
star
36

robot-detection

Python
3
star
37

sotm-2016-talk-vector-tiles

JavaScript
3
star
38

osm-pride-logos

Shell
3
star
39

openstreetmap-publess-walk

How to walk across Dublin without passing a pub
Python
3
star
40

gedcom-rdf

Convert GEDCOM geneology to/from RDF (semantic web)
Python
3
star
41

osm-num-active-contributors

Analyse OSM history files to see who has been editing 42+ days
Rust
2
star
42

pgautogeomindex

Python
2
star
43

osm2sqlite

Python
2
star
44

rdf2csv

Convert RDF files to CSV files
Python
2
star
45

josm-pTerry

JOSM plugin to add GNU Terry Pratchett to the clacks
Java
2
star
46

2021-osm-street-complete-edits

Some scripts for finding out how many people have used StreetComplete to edit OSM in an area
Shell
2
star
47

ascii_sparks

Simple ascii sparklines in python
Python
2
star
48

split-large-polygons

A script to break large polygons into many more managable smaller polygons
Python
2
star
49

rust-jsonseq

Read & Write json sequences
Rust
2
star
50

biblecode

Find equidistant letter sequences in texts
Rust
2
star
51

rust-geojsonseq

Read & Write GeoJSON sequences RFC 8142
Rust
1
star
52

openstreetmap-writer

Write OSM data files entirely within Python
Python
1
star
53

lunar-html

Lunar HTML is a testing library that helps you test your HTML pages, forms, links etc
Python
1
star
54

rust-shapefile

Read (and write) ESRI shapefiles in Rust
Rust
1
star
55

anglosaxon-python

Command line SAX parser builder
Python
1
star
56

openstreetmap-remove-tags

Parses an OpenStreeMap files, and remove all tags except for tags you specify
Python
1
star
57

django-cache-by-user

Make your django site share a cache for all anonymous users, increasing cache hits
Python
1
star
58

kindle-wikitravel

Kindle Wikitravel
Python
1
star
59

logainm-parse

Scripts etc. for parsing and working with logainm CSV data
Python
1
star
60

mr-osm-job

Using OSM with MrJob
Python
1
star
61

osm-longest-place-boundary-includes

Find long chains of "place X in the bounday Y, place Y is in boundary Z" chains from OpenStreetMap
Rust
1
star
62

read-progress-rs

Monitor how much you have read from a Read.
Rust
1
star
63

osm-user-rankings

Some simple scripts to parse an .osm file and find the prolific mappers
Python
1
star
64

FunkyTree

Python
1
star
65

osm-round-river-retriever

Find, in an OpenStreetMap data file, anything with `natural=water` that should probably have `water=river` tag
Shell
1
star
66

django-default-dont-cache

Opt-in caching for Django's per-site cache.
Python
1
star
67

panopticron

Python
1
star
68

TypeSlower

Train yourself to type slower to prevent RSI
Python
1
star
69

local-townlands

Development environment for a local install of townlands.ie
Python
1
star
70

RapiDly-Private

A privacy preserving version of RapiD/iD for OpenStreetMap mapping with AI
JavaScript
1
star
71

osm-river-basins

The old name for WaterwayMap.org
1
star