fast-python-pb: Fast Python Protocol Buffers
Thin wrapper on top of the C++ protocol buffer implementation resulting in significantly faster protocol buffers in Python.
Why:
We wanted a fast implementation of protocol buffers that still felt like Python, hence this implementation.
For our use case, this module is up to 15 times faster than the standard one and 10 times as fast as Python's json serializer.
Status:
This is a very early stage project. It works for our needs. We haven't verified it works beyond that. Issue reports and patches are very much appreciated!
For example, it only supports strint, int32, int64, double, and sub message members at this time.
Pre-requisites:
Install protocol buffers
Installation:
git clone https://github.com/Cue/fast-python-pb.git
cd fast-python-pb
python setup.py install
Usage:
protoc --fastpython_out /output/path --cpp_out /output/path --proto_path your/path your/path/file.proto
Example:
You can see the example in action in the benchmark directory.
// person.proto
package person_proto;
message Fact {
required string name = 1;
required string content = 2;
}
message Person {
required string name = 1;
required int32 birth_year = 2;
repeated string nicknames = 3;
repeated Fact facts = 4;
}
# example.py
import person_proto
lincoln = person_proto.Person(name = 'Abraham Lincoln', birth_year = 1809)
lincoln.nicknames = ['Honest Abe', 'Abe']
lincoln.facts = [
person_proto.Fact(name = 'Born In', content = 'Kentucky'),
person_proto.Fact(name = 'Died In', content = 'Washington D.C.'),
person_proto.Fact(name = 'Greatest Speech', content = GETTYSBURG)
]
serializedLincoln = lincoln.SerializeToString()
newLincoln = person_proto.Person()
newLincoln.ParseFromString(serializedLincoln)
The package
definition is mandatory; it determines the Python module name that the code
will generate. If it has dots for namespacing, like com.cueup.foo
, the last part of the
name (foo
) will be used for the Python module name.
One more thing
It's simple, but not that simple. The biggest caveat is that protobuf objects embedded in other protobuf objects are mutable, but all changes to them are discarded. If you want to build a protobuf with other protobufs in it, build them separately. To illustrate:
import addressbook_proto
entry = addressbook_proto.Entry(name='Gillian Baskin')
entry.birthplace = addressbook_proto.Location(state='Minnesota', town='Duluth')
# Now, to modify it. Don't do this:
entry.birthplace.town = 'New Town'
# Instead, do this:
birthplace = entry.birthplace
birthplace.town = 'New Town'
entry.birthplace = birthplace
There are also several methods for serializing and deserializing. Here's a list:
ParseFromString(str)
parses from a serialized protobuf stream.
ParseFromLongString(str)
has the same effect as ParseFromString(str)
, but is faster
for long strings and slower for short ones. This isn't a huge difference, but could be
important if you're dealing with very large protobufs.
SerializeToString()
returns the serialized form of the protobuf, as a string.
SerializeMany(protobufs)
takes a sequence of protobuf objects and serializes them to a
single string. The length of each protobuf is marked, so this can be serialized back to a
list of protobufs.
ParseMany(str, callback)
takes a string in the format produced by SerializeMany
, and
calls callback
with each protobuf object, in order. You can use this to build a list of
protobufs like this:
people = []
addressbook_proto.Person.ParseMany(serializedPeople, people.append)
print people # Will be a list of Person protobuf objects