core.rrb-vector
Why would anyone want to use this library? The two primary answers are:
- You want faster concatenation of vectors, which core.rrb-vector's
catvec
function provides for both Clojure and ClojureScript. - You use vectors of Java primitive types like long, double, etc.,
returned by Clojure's
vector-of
function, e.g. to reduce memory usage to about 1/3 of the memory required by vectors of arbitrary objects, and- You want the speed enabled by using the transient versions of such
vectors. Clojure does not implement transients for primitive
vectors created via
vector-of
-- core.rrb-vector does.
- You want the speed enabled by using the transient versions of such
vectors. Clojure does not implement transients for primitive
vectors created via
Vectors are one of the most commonly used data structures within
Clojure. Likely you already know that creating a vector equal to v
plus a new element e
appended to the end using the expression (conj v e)
has a run time that is "effectively constant", i.e. it takes
O(log N) time in the size N of v
, where the base of the logarithm is
32, so it is a constant at most 4 for all vector sizes up to a
million, and at most 7 for all vector sizes that Clojure supports.
The fastest way to concatenate two vectors v1
and v2
into a single
new vector is using an expression like (into v1 v2)
. This is
implemented by repeatedly appending a single element from the second
vector to the first, so it takes linear time in the size of v2
(multiplied by the effectively constant time mentioned above).
Aside: There might be another expression that has a better constant
factor for its run time than (into v1 v2)
does, and is thus faster.
However, any other such expression will still take at least linear
time in the size of the second vector.
The core.rrb-vector library uses a tree structure similar to the one
that Clojure uses internally for vectors, but generalizes it in such a
way that producing a new tree that represents the concatenation of two
input vectors using the catvec
function can be done in O(log N)
time, where N is the size of the result.
You can give catvec
vectors created in all of the ways you already
normally do, and while it will return a new type of object, this new
type behaves in all of the ways you expect a Clojure vector to behave.
This new type of vector is indistinguishable from a normal Clojure
vector unless you examine the value of (type v)
or (class v)
. In
particular, (vector? v)
is true for this new type, you can call all
of the usual sequence-based functions on it to examine or process its
elements, you can call conj
on it, nth
, etc.
Thus if you have a program where frequently concatenating large vectors to produce new vectors is useful, core.rrb-vector may help you write a much faster program in a more natural way.
This library is an implementation of the confluently persistent vector data structure introduced in the paper "RRB-Trees: Efficient Immutable Vectors", EPFL-REPORT-169879, September, 2011, by Phil Bagwell and Tiark Rompf.
RRB-Trees build upon Clojure's internal PersistentVector
class used
to implement its built in vectors, adding logarithmic time
concatenation and slicing (i.e. create sub-vectors from input
vectors). ClojureScript is supported with the same API, except for
the absence of the vector-of
function.
The main functions provided are clojure.core.rrb-vector/catvec
,
performing vector concatenation, and clojure.core.rrb-vector/subvec
,
which produces a new vector containing the appropriate subrange of the
input vector (in contrast to clojure.core/subvec
, which returns a
view on the input vector).
Like Clojure vectors, core.rrb-vector vectors can store arbitrary
values, or using vector-of
you can create vectors restricted to one
primitive type, e.g. long, double, etc. The core.rrb-vector
implementation provides seamless interoperability with the built in
Clojure vectors of class clojure.lang.PersistentVector
,
clojure.core.Vec
(vectors of primitive values) and
clojure.lang.APersistentVector$SubVector
instances:
clojure.core.rrb-vector/catvec
and clojure.core.rrb-vector/subvec
convert their inputs to clojure.core.rrb_vector.rrbt.Vector
instances whenever necessary (this is a very fast constant time
operation for PersistentVector and primitive vectors; for SubVector it
is O(log N), where N is the size of the underlying vector).
clojure.core.rrb-vector
also provides its own versions of vector
,
vector-of
, and vec
that always produce
clojure.core.rrb_vector.rrbt.Vector
instances. Note that
vector-of
accepts :object
as one of the possible type arguments,
in addition to keywords naming primitive types.
Usage
core.rrb-vector exports one public namespace:
(require '[clojure.core.rrb-vector :as fv])
Note that the ClojureScript version uses the same namespace name (it
does not use the alternative cljs.*
prefix!). This is because the
API is precisely the same (except clojure.core.rrb-vector/vector-of
only makes sense on the JVM and is therefore not available in
ClojureScript).
The docstring attached to the namespace provides an overview of the available functionality (as found at the top of this README):
(doc clojure.core.rrb-vector)
The new functionality is accessible through two functions:
clojure.core.rrb-vector/subvec
, which provides logarithmic-time
non-view slicing (in contrast to clojure.core/subvec
, which is a
constant-time operation producing view vectors that prevent the
underlying vector from becoming eligible for garbage collection), and
clojure.core.rrb-vector/catvec
, which provides logarithmic-time
concatenation. Crucially, these can be applied to regular
Clojure(Script) vectors.
(doc fv/subvec)
(doc fv/catvec)
;; apply catvec and subvec to regular Clojure(Script) vectors
(fv/catvec (vec (range 1234)) (vec (range 8765)))
(fv/subvec (vec (range 1024)) 123 456)
Additionally, several functions for constructing RRB vectors are
provided. There is rarely any reason to use them, since, as mentioned
above, the interesting functions exported by core.rrb-vector work with
regular vectors. Note that clojure.core.rrb-vector/vec
, in contrast
to clojure.core/vec
, reuses the internal tree of its input if it
already is a vector (of any type) and does not alias short arrays.
When passed a non-vector argument, it returns an RRB vector.
(doc fv/vector)
(doc fv/vector-of)
(doc fv/vec)
The debug namespace bundled with core.rrb-vector provides several utilities used by the test suite, as well as a function for visualizing the internal structure of vectors that works with regular Clojure(Script) vectors and RRB vectors.
;; for peeking under the hood
(require '[clojure.core.rrb-vector.debug :as dv])
(dv/dbg-vec (fv/catvec (vec (range 1234)) (vec (range 8765))))
Releases and dependency information
core.rrb-vector requires Clojure >= 1.5.0. View vectors created by
clojure.core/subvec
are correctly handled for Clojure >= 1.6.0. The
ClojureScript version is regularly tested against the most recent
ClojureScript release.
core.rrb-vector releases are available from Maven Central. Development snapshots are available from the Sonatype OSS repository.
-
Change log of changes made in this library.
-
Some benchmark results comparing the run time of core.rrb-vector's JVM implementation against other vector/list implementations on the JVM.
Follow the first link above to discover the current release number.
CLI/deps.edn
dependency information:
org.clojure/core.rrb-vector {:mvn/version "${version}"}
Leiningen dependency information:
[org.clojure/core.rrb-vector "${version}"]
Maven dependency information:
<dependency>
<groupId>org.clojure</groupId>
<artifactId>core.rrb-vector</artifactId>
<version>${version}</version>
</dependency>
Gradle dependency information:
compile "org.clojure:core.rrb-vector:${version}"
TODO
-
more tests;
-
performance: general perf tuning, more efficient
catvec
implementation (to replace current seq-ops-based impl).
Developer information
core.rrb-vector is being developed as a Clojure Contrib project, see the What is Clojure Contrib page for details. Patches will only be accepted from developers who have signed the Clojure Contributor Agreement.
Useful Maven commands
To run Clojure and ClojureScript tests:
$ mvn -DCLOJURE_VERSION=1.10.1 -Dclojure.version=1.10.1 clean test
Clojure versions as old as 1.5.1 can be tested with such a command, but the ClojureScript tests only work when using Clojure 1.8.0 or later.
To run tests and, if successful, create a JAR file in the targets directory:
$ mvn -DCLOJURE_VERSION=1.10.1 -Dclojure.version=1.10.1 clean package
Prerequisites: Only Java and Maven need to be installed. Maven will download whatever versions of Clojure are needed for the command you use. Both Clojure and ClojureScript tests are run with the commands given here. They use the Nashorn JavaScript run time environment included with Java -- no other JavaScript run time is needed.
Useful clj CLI commands
To run relatively short Clojure tests, but no ClojureScript tests:
$ ./script/jdo test
To run relatively short ClojureScript tests, but no Clojure tests:
$ ./script/sdo test
Warning: Currently the command above for running ClojureScript tests
does not show warnings from the ClojureScript compiler. I have seen
some ClojureScript compiler warnings appear when running the Maven
command above, and the Leiningen command given below for running
ClojureScript tests, that unfortunately do not appear using
./script/sdo test
. Suggestions welcome on how to make that command
also show similar warnings.
Replace test
in the commands above with one of the following for
other useful things:
sock
(or no argument at all) - start a REPL, and listen for a socket REPL connection on TCP port 50505long
- run a longer set of testscoll
- run generative tests fromcollection-check
libraryeast
- run Eastwood lint tool (clj version only, not cljs)
Useful Leiningen commands
To run Clojure tests, but no ClojureScript tests:
$ lein with-profile +1.10 test
You can test with Clojure versions 1.5 through 1.10 by specifying that
version number after the +
.
Prerequisites: Only Java and Leiningen. Leiningen will download whatever versions of Clojure and other libraries are needed.
To run ClojureScript tests with Node.js and SpiderMonkey JavaScript runtimes, but no Clojure tests:
$ lein with-profile +cljs cljsbuild test
Add node
or spidermonkey
as a separate argument after test
to
restrict the JavaScript runtime used to only the one you specify. You
may need to adjust the command names in the :test-commands
section
of the project.clj
file if the command for running those JavaScript
runtimes have a different name on your system than what is used there.
Prerequisites: Java, Leiningen, and either or both of Node.js and SpiderMonkey JavaScript run time environments.
To run normal Clojure tests, plus the
collection-check
tests, but no ClojureScript tests:
$ lein with-profile +coll,+1.7 test
The collection-check
tests require Clojure 1.7.0 or later, I believe
because collection-check and/or its dependencies require that.
There is no existing command configured to run collection-check
tests with ClojureScript.
To start a REPL from Leiningen with Clojure versions 1.6.0 and older, you must use Leiningen 2.8.0 (likely some other versions work, too).
Installing other software you will need
For all of the development commands you must have Java installed. This includes the ClojureScript compile and test commands, since the ClojureScript compiler is at least partially written in the Java version of Clojure.
Java
Install one or more of the pre-built binaries from AdoptOpenJDK, or several other providers of Java binaries.
Additional methods:
- Ubuntu 18.04 Linux:
sudo apt-get install default-jre
Maven
For any mvn
command you must install
Maven.
- Ubuntu 18.04 Linux:
sudo apt-get install maven
- macOS
- plus Homebrew:
brew install maven
- plus MacPorts:
sudo port install maven3
, then either use the commandmvn3
, or to usemvn
also run the commandsudo port select --set maven maven3
.
- plus Homebrew:
Leiningen
An install script and instructions are available on the Leiningen site.
Node.js JavaScript run time environment
Installation instructions for many different versions of Node.js are available on the Node.js web site. You can also install it using the commands below.
- Ubuntu 18.04 Linux:
sudo apt-get install nodejs
- macOS
- plus Homebrew:
brew install node
- plus MacPorts:
sudo port install nodejs10
. You can see other versions available via the commandport list | grep nodejs
.
- plus Homebrew:
SpiderMonkey JavaScript run time environment
Installation instructions for many different versions of SpiderMonkey are available on the SpiderMonkey web site. You may also install it using the commands below.
- Ubuntu 18.04 Linux:
sudo apt-get install libmozjs-52-dev
- macOS
- plus Homebrew: As of 2019-Sep-24,
brew install spidermonkey
installs version 1.8.5 of SpiderMonkey, which according to the Wikipedia page on SpiderMonkey was first released in 2011, with at least one release per year after that. The ClojureScript tests fail to run using this version of SpiderMonkey. It seems worth avoiding this version of SpiderMonkey for the purposes of testingcore.rrb-vector
. - plus MacPorts:
sudo port install mozjs52
- plus Homebrew: As of 2019-Sep-24,
Clojure(Script) code reuse
core.rrb-vector's vectors support the same basic functionality regular Clojure's vectors do (with the omissions listed above). Where possible, this is achieved by reusing code from Clojure's gvec and ClojureScript's PersistentVector implementations. The Clojure(Script) source files containing the relevant code carry the following copyright notice:
Copyright (c) Rich Hickey. All rights reserved.
The use and distribution terms for this software are covered by the
Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
which can be found in the file epl-v10.html at the root of this distribution.
By using this software in any fashion, you are agreeing to be bound by
the terms of this license.
You must not remove this notice, or any other, from this software.
Licence
Copyright © 2012-2023 Michał Marczyk, Andy Fingerhut, Rich Hickey and contributors
Distributed under the
Eclipse Public License 1.0,
the same as Clojure. The licence text can be found in the
epl-v10.html
file at the root of this distribution.