• Stars
    star
    142
  • Rank 250,717 (Top 6 %)
  • Language
    Java
  • License
    Other
  • Created about 10 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

FastTuple

Introduction

There are lots of good things about working on the JVM, like a world class JIT, operating system threads, and a world class garbage collector. However, one limiting factor can often be the interaction between primitive types and referenced types in Java. Primitive types are the built in types that represent integral numbers, floating point numbers, and boolean yes/no values. Primitives are also quite fast and memory efficient: they get allocated either on the stack if they're being used in a method, or inlined in an object when they're declared as field members. And they wind up being fast because the JIT can often optimize their access down to a single CPU instruction. This works really well when you know what types a class will hold as its state beforehand. If, on the other hand, you don't know what an object or array will hold at compile time the JVM forces you to box primitives. Boxing means that the primitives get wrapped in a heap allocated object, and their container will hold a reference to them. That type of overhead winds up being inefficient both in access time and memory space. Access time suffers because this type of layout breaks locality of reference, which is the principle that memory frequently accessed together should be stored adjacently. All modern memory hierarchies optimize for locality of reference in their caching implementations. The extra allocations and garbage generated also put pressure on the JVM's garbage collector, which can often be a cause of long pause times.

We wrote FastTuple to try and help solve this problem. FastTuple generates heterogeneous collections of primitive values and ensures as best it can that they will be laid out adjacently in memory. The individual values in the tuple can either be accessed from a statically bound interface, via an indexed accessor, or via reflective or other dynamic invocation techniques. FastTuple is designed to deal with a large number of tuples therefore it will also attempt to pool tuples such that they do not add significantly to the GC load of a system. FastTuple is also capable of allocating the tuple value storage entirely off-heap, using Java's direct memory capabilities.

FastTuple pulls off its trick via runtime bytecode generation. The user supplies it with a schema of field names and types. That schema is then built into a Java class definition which will contain accessor methods and either field definitions or the memory address for an off heap allocation, depending on which storage method was requested. The resulting Java class gets compiled into bytecode and then loaded as a reflective Class object. This Class object can then be used to create instances of the new class.

Building

This package depends on Janino 2.7.4, which is not currently getting pushed to maven central. As soon as that happens (we have a ticket in with the maintainer) we will begin pushing FastTuple to maven central as well. In the meantime you can build FastTuple like so:

wget http://janino.net/download/janino-2.7.4.zip
unzip janino-2.7.4.zip
mvn install:install-file -Dfile=janino-2.7.4/janino.jar -Dsources=janino-2.7.4/janino-src.zip -DgroupId=org.codehaus.janino -DartifactId=janino -Dversion=2.7.4 -Dpackaging=jar
mvn install:install-file -Dfile=janino-2.7.4/commons-compiler.jar -Dsources=janino-2.7.4/commons-compiler-src.zip -DgroupId=org.codehaus.janino -DartifactId=commons-compiler -Dversion=2.7.4 -Dpackaging=jar
mvn install

Note on GPG

You'll also need to ensure that you have GPG available on the command line. Mac OS X and Windows users may need to install via (https://gpgtools.org/) or cygwin respectively.

Usage

Interaction with FastTuple primarily takes place via the TupleSchema class. Each instance of TupleSchema describes a separate type of FastTuple both from the perspective of the FastTuple library and the JVM. At this time, allowable field types are the primitive classes for: long, int, short, char, byte, float, double. Support for String is planned for a later release. Some examples:

Heap Allocated Tuples

	TupleSchema schema = TupleSchema.builder().
		addField("fieldA", Long.TYPE).
		addField("fieldB", Int.TYPE).
		addField("fieldC", Short.TYPE).
		heapMemory().
		build();

	//creates a new tuple allocated on the JVM heap
	FastTuple tuple = schema.createTuple();

Direct Allocated Tuples

	TupleSchema schema = TupleSchema.builder().
		addField("fieldA", Long.TYPE).
		addField("fieldB", Int.TYPE).
		addField("fieldC", Short.TYPE).
		directMemory().
		build();

	//creates a new tuple, allocating memory off heap
	FastTuple tuple = schema.createTuple();
	//do some stuff
	tuple.setLong(1, 10000L);
	tuple.setInt(2, 50);
	tuple.setShort(3, (short)256);
	//if you don't destroy the tuple you are leaking memory
	schema.destroy(tuple);

Aligning Direct Allocated Tuples

Direct allocated tuples can be aligned such that they do not share cache lines. This is useful for situations where adjacent tuples are manipulated concurrently by different threads: an adequately padded FastTuple can eliminate false sharing in the CPU cache architecture. Veriifying that property, as expected, will require extensive benchmarking inside of the target system.

	TupleSchema schema = TupleSchema.builder().
		addField("fieldA", Long.TYPE).
		addField("fieldB", Int.TYPE).
		addField("fieldC", Short.TYPE).
		directMemory().
		padToWordSize(64).
		build();

	//creates a new tuple, allocating memory off heap
	FastTuple tuple = schema.createTuple();
	//do some stuff
	tuple.setLong(1, 10000L);
	tuple.setInt(2, 50);
	tuple.setShort(3, (short)256);
	//if you don't destroy the tuple you are leaking memory
	schema.destroy(tuple);

Utilizing Tuple Pools

Each schema will allocate a tuple pool per accessing thread if a poolSize is set.

	TupleSchema schema = TupleSchema.builder().
		addField("fieldA", Long.TYPE).
		addField("fieldB", Int.TYPE).
		addField("fieldC", Short.TYPE).
		poolOfSize(1024).
		//allocates an extra poolOfSize records when the pool is empty
		.expandingPool().
		build();

	//checks a tuple from the pool
	FastTuple tuple = schema.pool().checkout();
	//do some stuff
	tuple.setLong(1, 10000L);
	tuple.setInt(2, 50);
	tuple.setShort(3, (short)256);
	//if you don't check the tuple back in you either leak memory or objects. bad dog.
	schema.pool().release(tuple);

Performance

One of the main goals of this library is performance. Toward that end it has a full suite of microbenchmarks to test the various supported means of accessing and manipulating tuples for both tuning the library and showing the tradeoffs in overhead for things like pooling and allocating tuples on demand. Here's what a full run looks like on a late 2013 macbook pro 2.6ghz with Java 8 1.8.0_05-b13.

Benchmark                                                                              Mode   Samples         Mean   Mean error    Units
c.b.t.AccessMethodBenchmark.testAllocateSetAndDeallocate                              thrpt        20     7069.954      181.689   ops/ms
c.b.t.AccessMethodBenchmark.testClass                                                 thrpt        20  1628847.332    21756.564   ops/ms
c.b.t.AccessMethodBenchmark.testFastPool                                              thrpt        20    30715.791      570.086   ops/ms
c.b.t.AccessMethodBenchmark.testFastTuplePreAllocIndexed                              thrpt        20   160173.234     2498.380   ops/ms
c.b.t.AccessMethodBenchmark.testFastTuplePreAllocIndexedBoxing                        thrpt        20    59117.984      630.813   ops/ms
c.b.t.AccessMethodBenchmark.testFastTupleStaticBinding                                thrpt        20   157928.877     2411.094   ops/ms
c.b.t.AccessMethodBenchmark.testInvokeDynamic                                         thrpt        20    26085.594     1158.292   ops/ms
c.b.t.AccessMethodBenchmark.testLongArray                                             thrpt        20  1721846.666    23422.990   ops/ms
c.b.t.AccessMethodBenchmark.testOffheapAllocateAndSet                                 thrpt        20     7512.737      167.141   ops/ms
c.b.t.AccessMethodBenchmark.testOffheapDirectSet                                      thrpt        20   898743.426    37659.166   ops/ms
c.b.t.AccessMethodBenchmark.testOffheapSchemaSet                                      thrpt        20   367841.110     4282.620   ops/ms
c.b.t.AccessMethodBenchmark.testPooledObject                                          thrpt        20    12572.464      155.458   ops/ms
c.b.t.AccessMethodBenchmark.testQueuedObject                                          thrpt        20    25092.751      242.771   ops/ms
c.b.t.AccessMethodBenchmark.testReflectField                                          thrpt        20    28850.478      275.462   ops/ms
c.b.t.AccessMethodBenchmark.testStormTuple                                            thrpt        20    79693.517      888.956   ops/ms
c.b.t.AccessMethodBenchmark.testTuplePool                                             thrpt        20    70790.084      950.775   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaAllocate                thrpt        20     7214.300       98.387   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaDeque                   thrpt        20   153937.210     2534.986   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPoolEval                thrpt        20    53563.512      836.586   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPoolIface               thrpt        20    57669.480      716.962   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPoolIndexed             thrpt        20    57633.447     1025.338   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPoolIndexedBoxed        thrpt        20    38984.939      823.182   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPreallocEval            thrpt        20   332142.591     6426.566   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPreallocIface           thrpt        20   370593.271     5369.162   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPreallocIndexed         thrpt        20   377811.881     7077.471   ops/ms
c.b.t.FastTupleBenchmarks.DirectBenchmarks.measureDirectSchemaPreallocIndexedBoxed    thrpt        20   115699.056     2402.657   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaAllocate                    thrpt        20   990788.476    12295.764   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaDeque                       thrpt        20   173527.417     3619.733   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPoolEval                    thrpt        20    59377.121     1071.014   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPoolIface                   thrpt        20    62391.209      765.821   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPoolIndexed                 thrpt        20    62120.412     1105.981   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPoolIndexedBoxed            thrpt        20    42187.554      737.124   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPreallocEval                thrpt        20   397337.746     5274.807   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPreallocEvalField           thrpt        20   579136.768     8818.827   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPreallocIface               thrpt        20   554449.236     6740.073   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPreallocIndexed             thrpt        20   536258.735    10417.522   ops/ms
c.b.t.FastTupleBenchmarks.HeapBenchmarks.measureHeapSchemaPreallocIndexedBoxed        thrpt        20   202220.576     3806.694   ops/ms

To run the benchmarks:

mvn package
java -jar fasttuple-bench/target/microbenchmarks.jar

More Repositories

1

folsom

Expose Erlang Events and Metrics
Erlang
587
star
2

flake

A decentralized, k-ordered id generation service in Erlang
Erlang
573
star
3

high-scale-lib

A fork of Cliff Click's High Scale Library. Improved with bug fixes and a real build system.
Java
408
star
4

ordasity

Ordasity is Boundary's library for building stateful clustered services on the JVM.
Scala
343
star
5

scalang

Scalang is a scala wrapper that makes it easy to write services that interface with erlang.
Scala
203
star
6

wireshark

wireshark + boundary IPFIX decode patches
C
165
star
7

firespray

Blazingly fast streaming charts
JavaScript
106
star
8

html5-node-diagram

JavaScript
86
star
9

bear

a set of statistics functions for erlang
Erlang
68
star
10

overlock

Boundary's suite of concurrent scala utilities.
Scala
67
star
11

gen_lb

A generic library to load balance communication between Erlang nodes
Erlang
64
star
12

zoocreeper

A ZooKeeper backup tool.
Java
49
star
13

folsom_cowboy

A Cowboy based Folsom HTTP Wrapper.
Erlang
39
star
14

libdnet

updated fork of libdnet from https://code.google.com/p/libdnet/
C
39
star
15

folsom_webmachine

folsom based metrics via HTTP and JSON
Erlang
37
star
16

small_wonder

A Deployment Tool
Ruby
25
star
17

knife-plugins

Ruby
21
star
18

sigar

System Information Gatherer And Reporter
C
20
star
19

khial

a fake network driver to test network applications
C
19
star
20

atomicmap_challenge

A multithreaded scala programming challenge.
Scala
16
star
21

winpcap-installer

fork of the NMAP's silent WinPCAP installer
NSIS
13
star
22

labs-graphs

JavaScript
13
star
23

boundary_scripts

Shell
12
star
24

archaius-consul

A consul-backed configuration source for Archaius
Java
10
star
25

childspec_validator

Verify your Erlang childspecs before you use them.
Erlang
7
star
26

bprobe_cookbook

The Boundary bprobe cookbook.
Ruby
7
star
27

pulse-api-cli

Command line tools for Boundary APIs
Python
6
star
28

chef-boundary-annotations-handler

Chef exception handler for adding annotations to Boundary.
Ruby
5
star
29

boundary-event-plugins

Python
4
star
30

meter-plugin-rabbitmq

Metrics Plugin for RabbitMQ
Java
4
star
31

dw-consul

Tools for integrating consul with dropwizard applications.
Java
4
star
32

dropwizard-kafka

Kotlin
3
star
33

meter-plugin-sdk-lua

This is the framework for making the creation and maintenance of plugins easy.
Lua
3
star
34

boundary_splunk_app

A Splunk App for integration with Boundary
Python
3
star
35

meter-plugin-sdk-java

Java framework for creating plugins
Java
3
star
36

boundary-client-js

JavaScript
3
star
37

ibrowse

Erlang
3
star
38

boundary-vmware

Boundary Enterprise Integration with VMWare
Java
3
star
39

boundary-plugin-disk-summary

Disk Summary Plugin
Lua
2
star
40

boundary-plugin-lua-test

Repo for testing the LUA plugin framework
Lua
2
star
41

jenkins-boundary-annotations-plugin

Java
2
star
42

boundary_puppet

Boundary Meter puppet module
Puppet
2
star
43

java-streaming-client-example

A Java WebSocket client example for the Boundary Streaming API
Java
2
star
44

boundary-vagrant-snmp

Environment for testing SNMP integrations and plugins
Puppet
2
star
45

meter-plugin-zookeeper

Lua
2
star
46

chef-boundary-events-handler

Ruby
2
star
47

meter-plugin-docker

Lua
2
star
48

dataCommander

Data manager using Backbone: query, cache, merge
JavaScript
2
star
49

boundary-meter_cookbook

Boundary Meter Chef Cookbook
Ruby
2
star
50

boundary-plugin-aws-elb

Plugin that extracts metrics from AWS Cloud Watch on ELBs(Elastic Load Balancers)
Python
2
star
51

boundary-event-sdk

Boundary Event SDK
Java
2
star
52

meter-plugin-developer-guide

Documentation on how to development and deploy Boundary metric plugins
Makefile
2
star
53

boundary-plugin-aws-rds

Collects metrics from the Amazon Relational Database Service (RDS)
Python
2
star
54

meter-plugin-nginx-plus

This plugin is for Nginx+ Users to Monitor Nginx
Lua
2
star
55

jenkins-boundary-event-plugin

Java
1
star
56

boundary-plugin-syslog

Boundary Plugin for Syslog
1
star
57

meter-plugin-activemq

Collect metrics from an active MQ instance
Lua
1
star
58

meter-plugin-snmp

Boundary Plugin for SNMP
1
star
59

boundary-plugin-consul-healthchecks

Queries Consul health checks and returns events to Boundary on a status change of a given health check
Lua
1
star
60

meter-plugin-cassandra

Collects metrics from a cassandra instance
Java
1
star
61

boundary-plugin-diskrw_summary

Lua
1
star
62

webhook-action-example

Provides an example of the handling of the webhook call from a Boundary action
1
star
63

vagrant-plugin-dev-env

Vagrant environment for developing plugins
Shell
1
star
64

meter-plugin-wmi

Lua
1
star
65

ZenPacks.boundary.EventAdapter

Zenoss 4.x event adapter for Boundary
Python
1
star
66

meter-plugin-apache-tomcat

Java
1
star
67

libstatsite

C
1
star
68

meter-plugin-solr

Lua
1
star
69

plugin-flask

boundary meter flask plugin
Lua
1
star
70

boundary-plugin-elasticsearch

Collects metrics from a elasticsearch
Lua
1
star
71

boundary-plugin-riak

Plugin for extracting metrics from a Riak instance.
Lua
1
star
72

boundary-plugin-aws-redshift

Collects metrics from Amazon Redshift
Python
1
star
73

meter-plugin-windows-process

PowerShell
1
star
74

tsi-lab

Virtual machine for examples on getting data into TrueSight Intelligence
Python
1
star
75

boundary-action-handler

Endpoint to handle actions from Boundary Premimum
Java
1
star
76

stream-lib

Stream summarizer and cardinality estimator.
Java
1
star
77

meter-plugin-hello-world-lua

Example plugin in "Hello World" fashion.
Lua
1
star
78

meter-plugin-http

Meter plugin to measure http request page load
Python
1
star