What is this?
Stashpy aims to be a slimmed-down Python 3 replacement for Logstash, a log aggregator. Stashpy accepts connections on a TCP port, parses messages passed to it over this connection, and indexes them on an ElasticSearch cluster.
Stashpy is still in development.
Installing and running
Stashpy requires Python 3. All Linux distros have a relatively new version in their official repositories. On Mac OS, the Homebrew version is recommended.
Among the various methods of installing package dependencies,
virtualenv is recommended. Python 3.5 comes with pyvenv, which is a
built-in equivalent of virtualenv. For earlier versions, you will need
to manually install virtualenv with sudo pip install virtualenv
. If
you already have a 2.* version of Python installed together with
virtualenv, you can install Python 3 and call virtualenv with the
--python
argument to use the new version, as in virtualenv stashpy --python `which python3`
.
It is possible to install Stashpy simply as a dependency or with pip install stashpy
as it is on
PyPI. For development purposes,
you can simply create a virtualenv wherever you prefer, and run
Stashpy from there. The entry point is named stashpy
, and accepts a
configuration file as the only argument. A sample configuration file
sample-config.yml
is provided. See the Usage section for
documentation on different configuration options.
The recommended way to run Stashpy as a service is by checking it out
into /opt
, and also creating a virtualenv there. To run as an
upstart-managed service, consult the provided
stashpy.conf
. Unfortunately, we haven't moved on to systemd yet, so
no sample configuration for that, but it shouldn't be too difficult.
Usage
As can be seen from the extension of sample-config.yml
, the
configuration format is YAML. Configuration options are the following:
-
address
,port
: The address and port on which Stashpy should listen. -
indexer_config
: Configuration options for the ElasticSearch cluster to index on. If this key is excluded, nothing will be indexed, which is a useful setup for debugging purposes. Must have the following keys:-
host
: ES host -
port
: ES port -
index_pattern
: The base pattern to be used for determining index name. This pattern will be passed on todatetime.strftime
, and will then be formatted with the parsed values dictionary.
-
-
logging
: This option will be passed on as-is to thelogging.config.dictConfig
method. If it is not supplied,stashpy.main.DEFAULT_LOGGING
, which simply logs to stdout, will be used. -
heartbeat_count
: The number of messages at which a heartbeat log is written. -
processor_spec
: The parsing specification. See the next section for details.
Parsing Specification
Stashpy turns log lines (i.e. strings that end with a newline) supplied through a TCP connection into JSON documents, and indexes these in ElasticSearch. The log lines are parsed using regular expressions, which can be specified in one of two different formats: the parse format, or Oniguruma-flavored named regular expressions. The first of these is a normal Python 3 formattable string that is processed using the parse library. An example for such an expression is the following:
My name is {name} and I'm {age:d} years old
Parsing the sentence My name is Afro and I'm 40 years old
would lead
to the JSON document {"age": 40, "name": "Afro"}
.
The second format, Oniguruma-flavored regular expressions, uses the regex library to provide an experience similar to that of the grok plugin for Logstash. In an Oniguruma RE, you can build complex regular expressions by combining simpler components. For example, take the following regular expression:
My name is %{USERNAME:name} and I'm %{INT:age} years old
This expression, when used as a specification, leads to the following regular expression:
My name is (?P<name>[a-zA-Z0-9._-]+) and I'm (?P<age>(?:[+-]?(?:[0-9]+))) years old
See the file stashpy/patterns/grok_patterns.txt
for a list of the
various components you can use in your regular expressions.
For both of these parsing options, it is possible to specify a type to
which the parsed string should be converted. parse
does this
automatically, converting e.g. a age
above to integer. With
Oniguruma, you can specify the type the parsed value should be
converted to with a constructor appended to the pattern name,
separated by a colon. For the above example, converting age
to an
integer would work with the following expression:
My name is %{USERNAME:name} and I'm %{INT:age:int} years old
The type conversion key is used for looking up a function from the
__builtins__
dictionary; any constructor available there can be
used.
Specifying the parsing pipeline
There are two alternative ways of providing the order of processing.
processor_spec
By The first method is by providing the processor_spec
option in the
configuration file. This option can have two keys:
-
to_dict
: A list of expressions that are turned into JSON documents without further processing. These expressions can be a parse string or an Oniguruma RE. -
to_format
: A list of dictionaries whose keys are specifications and values are dictionariers that are to be formatted based on parsed values.
Here's the relevant part from sample-config.yml
:
processor_spec:
to_dict:
- "My name is {name} and I'm {age:d} years old."
to_format:
"Her name is {name} and she's {age:d} years old.":
name_line: "Name is {name}"
age_line: "Age is {age:d}"
Passing the log line My name is Afro and I'm 40 years old.
to
Stashpy with this configuration will result in the JSON document
{"age": 40, "name": "Afro"}
. The log line Her name is Luna and she's 4 years old.
will, however, activate the to_format
section
and lead to the JSON document {"age_line": "Age is 4", "name_line": "Name is Luna"}
.
By custom class
The second method of processing log lines is by specifying a class
that is responsible for accepting them and returning dictionaries. The
path of this class can be passed using the processor_class
option. This class must stashpy.LineProcessor
and implement the
method for_line(self, line)
, which will be called for each log
line. Two useful methods from the parent class that can be used for
more specialized processing are do_dict_specs(self, line)
, and
do_format_specs(self, line)
. The first method returns the result for
the first match from the self.dict_specs
list, while the second does
the same for the self.format_specs
attribute. Both return None
if
there are no matches. If your class has the class attributes TO_DICT
or TO_FORMAT
, these will be used to populate the instance
attributes. The following custom class is equivalent to the
processor_spec
example above:
class KitaHandlerTwo(stashpy.LineProcessor):
TO_DICT = ["My name is {name} and I'm {age:d} years old."]
TO_FORMAT = {"Her name is {name} and she's {age:d} years old.":
{'name_line':"Name is {name}", 'age_line':"Age is {age}"}}
def for_line(self, line):
dict_result = self.do_dict_specs(line)
if dict_result:
return dict_result
format_result = self.do_format_specs(line)
if format_result:
return format_result
return None
Processing pipeline
The order of processing in Stashpy is relatively straightforward.
First, the to_dict
specs are applied; if any of the patterns match,
the resulting dictionary is returned. If there are no such matches,
the to_format
specs are applied, and the result from the first match
is returned. If you are using a custom class for processing, you can
introduce your own ordering and logic.
Testing
One thing that really annoyed me about Logstash was that testing
patterns was incredibly difficult; the only reliable test could be
done on the live system. Stashpy aims to make testing patterns
simpler. In order to test a parsing specification, simply subclass
stashpy.tests.PatternTest
. This class has three methods that can be
used to process a logline with a parsing specification:
-
process_to_dict(self, to_dict_spec, logline)
: Processes a list of to-dictionary parsing specifications, and returns result. -
process_to_format(self, to_format_spec, logline)
: Processes a list of to-format parsing specifications, and returns result.
Here is a sample test:
import unittest
from stashpy.tests import PatternTest
class SampleTest(PatternTest):
def test_pattern(self):
self.assertDictEqual(
self.process_to_dict(["My name is {name} and I'm {age:d} years old."],
"My name is Yuri and I'm 3 years old."),
{'name': 'Yuri', 'age': 3})
if __name__ == '__main__':
unittest.main()