Pyleus
Pyleus is a Python 2.6+ framework for developing and launching Apache Storm topologies.
Please visit our documentation.
master | develop |
---|---|
About
Pyleus is a framework for building Apache Storm topologies in idiomatic Python.
With Pyleus you can:
- define a topology with a simple YAML file
- have dependency management with a
requirements.txt
file - run faster thanks to Pyleusโ MessagePack based serializer
- pass options to your components directly from the YAML file
- use the Kafka spout built into Storm with only a YAML change
Install
From PyPI:
$ pip install pyleus
Note:
You do NOT need to install pyleus on your Storm cluster. Thatโs cool, isn't it?
However, if you are going to use system_site_packages: true
in your config file, you should be aware that the environment of your Storm nodes needs to match the one on the machine used for building the topology. This means you actually have to install pyleus on your Storm cluster in this case.
Try it out!
$ git clone https://github.com/Yelp/pyleus.git
$ pyleus build pyleus/examples/exclamation_topology/pyleus_topology.yaml
$ pyleus local exclamation_topology.jar
Or, submit to a Storm cluster with:
$ pyleus submit -n NIMBUS_HOST exclamation_topology.jar
The examples directory contains several annotated Pyleus topologies that try to cover as many Pyleus features as possible.
Pyleus command line interface
Build a topology:
$ pyleus build /path/to/pyleus_topology.yaml
Run a topology locally:
$ pyleus local /path/to/topology.jar
Submit a topology to a Storm cluster:
$ pyleus submit [-n NIMBUS_HOST] /path/to/topology.jar
List all topologies running on a Storm cluster:
$ pyleus list [-n NIMBUS_HOST]
Kill a topology running on a Storm cluster:
$ pyleus kill [-n NIMBUS_HOST] TOPOLOGY_NAME
Try pyleus -h
for a list of all the available commands or pyleus CMD -h
for any command-specific help.
Write your first topology
Please refer to the documentation for a more detailed tutorial.
Organize your files
This is an example of the directory tree of a simple topology:
my_first_topology/
|-- my_first_topology/
| |-- __init__.py
| |-- dummy_bolt.py
| |-- dummy_spout.py
|-- pyleus_topology.yaml
|-- requirements.txt
Define the topology layout
A simple pyleus_topology.yaml
should look like the following:
name: my_first_topology
topology:
- spout:
name: my-first-spout
module: my_first_topology.dummy_spout
- bolt:
name: my-first-bolt
module: my_first_topology.dummy_bolt
groupings:
- shuffle_grouping: my-first-spout
This defines a topology where a single bolt subscribes to the output stream of a single spout. As simple as it is.
Write your first spout
This is the code implementing dummy_spout.py
:
from pyleus.storm import Spout
class DummySpout(Spout):
OUTPUT_FIELDS = ['sentence', 'name']
def next_tuple(self):
self.emit(("This is a sentence.", "spout",))
if __name__ == '__main__':
DummySpout().run()
Write your first bolt
Let's now look at dummy_bolt.py
:
from pyleus.storm import SimpleBolt
class DummyBolt(SimpleBolt):
OUTPUT_FIELDS = ['sentence']
def process_tuple(self, tup):
sentence, name = tup.values
new_sentence = "{0} says, \"{1}\"".format(name, sentence)
self.emit((new_sentence,), anchors=[tup])
if __name__ == '__main__':
DummyBolt().run()
Run your topology
Run the topology on your local machine for debugging:
pyleus build my_first_topology/pyleus_topology.yaml
pyleus local --debug my_first_topology.jar
When you are done, hit C-C
.
Configuration File
You can set default values for many configuration options by placing a .pyleus.conf
file in your home directory:
[storm]
nimbus_host: 10.11.12.13
jvm_opts: -Djava.io.tmpdir=/home/myuser/tmp
[build]
pypi_index_url: http://pypi.ninjacorp.com/simple/
Reference
License
Pyleus is licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0