• Stars
    star
    271
  • Rank 151,717 (Top 3 %)
  • Language
    Scala
  • License
    MIT License
  • Created over 12 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast, deboxed, specialized data structures for Scala

Debox

Overview

Debox provides specialized mutable collections that don't box.

For performance reasons, Debox's types are not compatible with Scala's collections framework (although conversions are possible). You may find that Debox's structures provide more reliable performance than Scala's mutable collections.

Debox is available for Scala 2.11 and 2.12.

(For Scala 2.10, use Debox 0.7.3 or older.)

Set up

If you use SBT to build your project, you can use Debox via the following snippet:

resolvers += Resolver.sonatypeRepo("releases"),

libraryDependencies += "org.spire-math" %% "debox" % "0.8.0"

Debox Types

Buffer

debox.Buffer is an indexed data structure backed by an array which can be appended to. It corresponds to collection.mutable.ArrayBuffer but has better performance due to specialization. It can also wrap instances of Array to provide specialized versions of foreach and map, which are a bit faster.

Buffers can grow internally. Appending, such as += and ++=, and removing and from the end of the buffer, as pop does, will be fast operations. Other operations (like adding to the middle of the buffer with insert) may require internal copying.

Large buffers which have most of their elements removed will still maintain a large underlying array. Use compact to reclaim unnecessary memory in these situations.

Example usage:

import debox.Buffer

val buf = Buffer.empty[Int]
buf += 1
buf += 1
buf += 2
buf.foreach { n => println(n) } // prints 1, 1, 2

val child = buf.copy
child += 3
child(0) = 999
child.toString // Buffer(999, 1, 2, 3)

val buf ++= child
buf.pop
buf.sum  // uses spire

Set

debox.Set corresponds to collection.mutable.Set, but without boxing. The hashing is done via an open addressing scheme with re-hashing, which is derived from the strategy used by Python. See Hashing Strategy for a more complete description.

Sets are required to maintain extra space to ensure fast average lookup times. Sets will tend to use 33-66% of the underlying storage.

Large sets which have most of their elements removed will still maintain a large underlying array. Use compact to reclaim unnecessary memory in these situations.

Example usage:

import debox.Set

val set = Set.empty[Int]
set += 1
set += 1
set += 2
set(0) // false
set(1) // true

val child = buf.copy
child += 3
child += 999
child.size == 4 // true

val other = Set(2, 3, 4)
set &= other
set(1) // false
set(2) // true
set.size == 1 // true

Map

debox.Map corresponds to collection.mutable.Map, but without boxing. As with debox.Set the hashing is done via an open addressing scheme with re-hashing, which is derived from the strategy used by Python. See Hashing Strategy for a more complete description.

Maps are required to maintain extra space to ensure fast average lookup times. Maps will tend to use 33-66% of the underlying storage.

Large maps which have most of their elements removed will still maintain a large underlying array. Use compact to reclaim unnecessary memory in these situations.

Unlike Scala Maps (which store a key and value together as a Tuple2), Debox stores keys and values in separate arrays. This makes iterating over keys or values separately faster, but means that operations which treat a map as a sequence of tuples are slower and/or not supported.

Example usage:

import debox.Map

val m = Map.empty[String, Int]
m("boris") = 1887
m("bela") = 1880
m("bela") = 1882
m.size // 2

m.contains("bela")   // true
m.contains("donald") // false
m.contains(12345)    // compile-time error!

m.get("bela") // Some(1882)
m.get("donald") // None

m("boris")  // 1887
m("bela")   // 1882
m("donald") // debox.KeyNotFoundException

m ++= Map("christopher" -> 1922, "vincent" -> 1911)
m.keysSet // Set(christopher, vincent, bela, boris), order is arbitrary

val b = m.mapValues(year => "born in %d" format year)
b // Map(boris -> born in 1887, bela -> born in 1882, ...)

Hashing Strategy

The hashing used in Set and Map works as follows:

  1. Get the item's hashcode (retrieved by the ## operator) as i.
  2. Mask this by the underlying array's max index to get j.
  3. If slot j is free, use it and return.
  4. Else, re-hash i and repeat.

The re-hashing strategy uses perturbation (initialized to the original hashcode) as well as the current i value. The transition can be expressed as:

i = (i << 2) + i + perturbation + 1
perturbation >>= 5

For a much more detailed treatment, you can read the comments on CPython's dictobject.c here.

Benchmarks

Most of Debox has been developed in tandem with aggressive benchmarking using Caliper and other tools.

The benchmarks can be run from SBT via benchmark/run.

Disclaimers and Provisos

Debox aims to achieve the best possible performance through use of features like specialization, macros, arrays, etc. All other concerns (such as visibility, subtyping relationships, type signatures, etc.) are secondary.

Unlike many Java (and Scala?) projects, Debox is not interested in hiding its internals beyond what is convenient. To aid inlining, most internals are public. This does not mean that users should modify them directly--attempting to manually update the structures could produce non-deterministic effects.

Debox chooses not to provide methods whose implementations are guaranteed to be slow. Rather than trying to provide every possibly useful method, the goal is to provide core functionality which can be implemented efficiently and which plays to the data structure's strengths.

It's possible that in the future Debox will use a system of imports to optionally add "slow" methods which have been left off the current API. Since Debox is not at a 1.0 release yet, the API may change from version to version. Debox makes no source- or binary-compatibility guarantees.

Criticisms, suggestions and patches are all welcome, as are benchmarking results (especially surprising ones)!

Copyright and License

All code is available to you under the MIT license, available at http://opensource.org/licenses/mit-license.php and also in the COPYING file.

Copyright Erik Osheim, 2012-2018.

More Repositories

1

antimirov

algebraic manipulation of regular expressions
Scala
193
star
2

chain

Simple, efficient wrapper for fast concatenation/iteration.
Scala
114
star
3

imp

macro for summoning implicit values
Scala
92
star
4

clouseau

Discover java object sizes through questionable sleuthing plus luck.
Scala
67
star
5

alleycats

Cats instances and classes which are outlaws, miscreants, and ne'er-do-wells.
Scala
62
star
6

pairing

Easily use tmux to collaborate with distant friends!
Shell
53
star
7

kronecker

Library for counting and enumerating things.
Scala
44
star
8

sbt-javap

Run javap directly from the SBT console
Scala
30
star
9

junkion

Library that provides those stupid little IO utility methods you always end up needing.
Scala
26
star
10

uxntal-mode

emacs major mode for the uxntal assembly language
Emacs Lisp
23
star
11

zillion

Provide names for numbers in English
Scala
22
star
12

cats-check

Cats instances for ScalaCheck data types
Scala
19
star
13

irreg

Generic regular expression library for Scala using Spire
Scala
18
star
14

scala-mode2-deprecated

Better scala-mode for Emacs
Emacs Lisp
14
star
15

streamrec

Building infinite sequences from anonymous functions using tuples and macros.
Scala
14
star
16

zzbot

Multi-platform scala REPL bot
Scala
13
star
17

sortilege

methods for divination and predicting the future
Scala
13
star
18

curator

Immutable data structures for Scala
Scala
9
star
19

big-electric-cat

Demonstrate using Cats with Spark
Scala
9
star
20

spire-diff

Generic implementation of the Diff algorithm
Scala
8
star
21

literati

Literate programming for Scala from within Markdown slides
Scala
8
star
22

zillionaire

Speak numbers out loud given their decimel representation.
Scala
6
star
23

bothub.org

repo for the website bothub.org http://bothub.org
HTML
5
star
24

numeric-demo

Some work I've done toward specializing Scala's Numeric trait
Scala
5
star
25

scalac-intset

Some small, fast data structures for scalac
Scala
5
star
26

spire-time

Spire bindings for date/time libraries, such as Joda Time.
Scala
4
star
27

bulk-importer

Scala compiler plugin for bulk imports
Scala
4
star
28

caliper.g8

Giter8 template for building micro-benchmarks with Caliper
Scala
3
star
29

spire-sym

simple proof-of-concept symbolic type for Spire
Scala
3
star
30

knife-juggler

Utilities for those who like to live dangerously.
3
star
31

rebox

Efficient immutable wrappers for unboxed data
Scala
3
star
32

pixter

pixter is a program for display small images directly in a terminal emulator
Python
2
star
33

apoc

Apocalypse Word random character generator
JavaScript
1
star
34

spire-www

JavaScript
1
star
35

zone11

Zone 11 is a game created for the PyWeek contest
Python
1
star
36

redrover

Red Rover game for Ludum Dare
Python
1
star
37

ordering-benchmark

Code used to benchmark scala.math.Ordering
Scala
1
star
38

dungeoneer

Simple text parser for ad-hoc dungeon grids
Python
1
star
39

spec-benchmark

Benchmarking of specialized Scala
Scala
1
star
40

hyper

Experiments with transfinite and hyperreal numbers
Scala
1
star
41

oslo2014

flatmap oslo 2014 slides and project
Scala
1
star