• Stars
    star
    191
  • Rank 202,877 (Top 4 %)
  • Language
    Clojure
  • License
    Eclipse Public Li...
  • Created over 11 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

RRB-Trees in Clojure

core.rrb-vector

Why would anyone want to use this library? The two primary answers are:

  • You want faster concatenation of vectors, which core.rrb-vector's catvec function provides for both Clojure and ClojureScript.
  • You use vectors of Java primitive types like long, double, etc., returned by Clojure's vector-of function, e.g. to reduce memory usage to about 1/3 of the memory required by vectors of arbitrary objects, and
    • You want the speed enabled by using the transient versions of such vectors. Clojure does not implement transients for primitive vectors created via vector-of -- core.rrb-vector does.

Vectors are one of the most commonly used data structures within Clojure. Likely you already know that creating a vector equal to v plus a new element e appended to the end using the expression (conj v e) has a run time that is "effectively constant", i.e. it takes O(log N) time in the size N of v, where the base of the logarithm is 32, so it is a constant at most 4 for all vector sizes up to a million, and at most 7 for all vector sizes that Clojure supports.

The fastest way to concatenate two vectors v1 and v2 into a single new vector is using an expression like (into v1 v2). This is implemented by repeatedly appending a single element from the second vector to the first, so it takes linear time in the size of v2 (multiplied by the effectively constant time mentioned above).

Aside: There might be another expression that has a better constant factor for its run time than (into v1 v2) does, and is thus faster. However, any other such expression will still take at least linear time in the size of the second vector.

The core.rrb-vector library uses a tree structure similar to the one that Clojure uses internally for vectors, but generalizes it in such a way that producing a new tree that represents the concatenation of two input vectors using the catvec function can be done in O(log N) time, where N is the size of the result.

You can give catvec vectors created in all of the ways you already normally do, and while it will return a new type of object, this new type behaves in all of the ways you expect a Clojure vector to behave. This new type of vector is indistinguishable from a normal Clojure vector unless you examine the value of (type v) or (class v). In particular, (vector? v) is true for this new type, you can call all of the usual sequence-based functions on it to examine or process its elements, you can call conj on it, nth, etc.

Thus if you have a program where frequently concatenating large vectors to produce new vectors is useful, core.rrb-vector may help you write a much faster program in a more natural way.

This library is an implementation of the confluently persistent vector data structure introduced in the paper "RRB-Trees: Efficient Immutable Vectors", EPFL-REPORT-169879, September, 2011, by Phil Bagwell and Tiark Rompf.

RRB-Trees build upon Clojure's internal PersistentVector class used to implement its built in vectors, adding logarithmic time concatenation and slicing (i.e. create sub-vectors from input vectors). ClojureScript is supported with the same API, except for the absence of the vector-of function.

The main functions provided are clojure.core.rrb-vector/catvec, performing vector concatenation, and clojure.core.rrb-vector/subvec, which produces a new vector containing the appropriate subrange of the input vector (in contrast to clojure.core/subvec, which returns a view on the input vector).

Like Clojure vectors, core.rrb-vector vectors can store arbitrary values, or using vector-of you can create vectors restricted to one primitive type, e.g. long, double, etc. The core.rrb-vector implementation provides seamless interoperability with the built in Clojure vectors of class clojure.lang.PersistentVector, clojure.core.Vec (vectors of primitive values) and clojure.lang.APersistentVector$SubVector instances: clojure.core.rrb-vector/catvec and clojure.core.rrb-vector/subvec convert their inputs to clojure.core.rrb_vector.rrbt.Vector instances whenever necessary (this is a very fast constant time operation for PersistentVector and primitive vectors; for SubVector it is O(log N), where N is the size of the underlying vector).

clojure.core.rrb-vector also provides its own versions of vector, vector-of, and vec that always produce clojure.core.rrb_vector.rrbt.Vector instances. Note that vector-of accepts :object as one of the possible type arguments, in addition to keywords naming primitive types.

Usage

core.rrb-vector exports one public namespace:

(require '[clojure.core.rrb-vector :as fv])

Note that the ClojureScript version uses the same namespace name (it does not use the alternative cljs.* prefix!). This is because the API is precisely the same (except clojure.core.rrb-vector/vector-of only makes sense on the JVM and is therefore not available in ClojureScript).

The docstring attached to the namespace provides an overview of the available functionality (as found at the top of this README):

(doc clojure.core.rrb-vector)

The new functionality is accessible through two functions: clojure.core.rrb-vector/subvec, which provides logarithmic-time non-view slicing (in contrast to clojure.core/subvec, which is a constant-time operation producing view vectors that prevent the underlying vector from becoming eligible for garbage collection), and clojure.core.rrb-vector/catvec, which provides logarithmic-time concatenation. Crucially, these can be applied to regular Clojure(Script) vectors.

(doc fv/subvec)
(doc fv/catvec)

;; apply catvec and subvec to regular Clojure(Script) vectors
(fv/catvec (vec (range 1234)) (vec (range 8765)))
(fv/subvec (vec (range 1024)) 123 456)

Additionally, several functions for constructing RRB vectors are provided. There is rarely any reason to use them, since, as mentioned above, the interesting functions exported by core.rrb-vector work with regular vectors. Note that clojure.core.rrb-vector/vec, in contrast to clojure.core/vec, reuses the internal tree of its input if it already is a vector (of any type) and does not alias short arrays. When passed a non-vector argument, it returns an RRB vector.

(doc fv/vector)
(doc fv/vector-of)
(doc fv/vec)

The debug namespace bundled with core.rrb-vector provides several utilities used by the test suite, as well as a function for visualizing the internal structure of vectors that works with regular Clojure(Script) vectors and RRB vectors.

;; for peeking under the hood
(require '[clojure.core.rrb-vector.debug :as dv])
(dv/dbg-vec (fv/catvec (vec (range 1234)) (vec (range 8765))))

Releases and dependency information

core.rrb-vector requires Clojure >= 1.5.0. View vectors created by clojure.core/subvec are correctly handled for Clojure >= 1.6.0. The ClojureScript version is regularly tested against the most recent ClojureScript release.

core.rrb-vector releases are available from Maven Central. Development snapshots are available from the Sonatype OSS repository.

Follow the first link above to discover the current release number.

CLI/deps.edn dependency information:

org.clojure/core.rrb-vector {:mvn/version "${version}"}

Leiningen dependency information:

[org.clojure/core.rrb-vector "${version}"]

Maven dependency information:

<dependency>
  <groupId>org.clojure</groupId>
  <artifactId>core.rrb-vector</artifactId>
  <version>${version}</version>
</dependency>

Gradle dependency information:

compile "org.clojure:core.rrb-vector:${version}"

TODO

  1. more tests;

  2. performance: general perf tuning, more efficient catvec implementation (to replace current seq-ops-based impl).

Developer information

core.rrb-vector is being developed as a Clojure Contrib project, see the What is Clojure Contrib page for details. Patches will only be accepted from developers who have signed the Clojure Contributor Agreement.

Useful Maven commands

To run Clojure and ClojureScript tests:

$ mvn -DCLOJURE_VERSION=1.10.1 -Dclojure.version=1.10.1 clean test

Clojure versions as old as 1.5.1 can be tested with such a command, but the ClojureScript tests only work when using Clojure 1.8.0 or later.

To run tests and, if successful, create a JAR file in the targets directory:

$ mvn -DCLOJURE_VERSION=1.10.1 -Dclojure.version=1.10.1 clean package

Prerequisites: Only Java and Maven need to be installed. Maven will download whatever versions of Clojure are needed for the command you use. Both Clojure and ClojureScript tests are run with the commands given here. They use the Nashorn JavaScript run time environment included with Java -- no other JavaScript run time is needed.

Useful clj CLI commands

To run relatively short Clojure tests, but no ClojureScript tests:

$ ./script/jdo test

To run relatively short ClojureScript tests, but no Clojure tests:

$ ./script/sdo test

Warning: Currently the command above for running ClojureScript tests does not show warnings from the ClojureScript compiler. I have seen some ClojureScript compiler warnings appear when running the Maven command above, and the Leiningen command given below for running ClojureScript tests, that unfortunately do not appear using ./script/sdo test. Suggestions welcome on how to make that command also show similar warnings.

Replace test in the commands above with one of the following for other useful things:

  • sock (or no argument at all) - start a REPL, and listen for a socket REPL connection on TCP port 50505
  • long - run a longer set of tests
  • coll - run generative tests from collection-check library
  • east - run Eastwood lint tool (clj version only, not cljs)

Useful Leiningen commands

To run Clojure tests, but no ClojureScript tests:

$ lein with-profile +1.10 test

You can test with Clojure versions 1.5 through 1.10 by specifying that version number after the +.

Prerequisites: Only Java and Leiningen. Leiningen will download whatever versions of Clojure and other libraries are needed.

To run ClojureScript tests with Node.js and SpiderMonkey JavaScript runtimes, but no Clojure tests:

$ lein with-profile +cljs cljsbuild test

Add node or spidermonkey as a separate argument after test to restrict the JavaScript runtime used to only the one you specify. You may need to adjust the command names in the :test-commands section of the project.clj file if the command for running those JavaScript runtimes have a different name on your system than what is used there.

Prerequisites: Java, Leiningen, and either or both of Node.js and SpiderMonkey JavaScript run time environments.

To run normal Clojure tests, plus the collection-check tests, but no ClojureScript tests:

$ lein with-profile +coll,+1.7 test

The collection-check tests require Clojure 1.7.0 or later, I believe because collection-check and/or its dependencies require that.

There is no existing command configured to run collection-check tests with ClojureScript.

To start a REPL from Leiningen with Clojure versions 1.6.0 and older, you must use Leiningen 2.8.0 (likely some other versions work, too).

Installing other software you will need

For all of the development commands you must have Java installed. This includes the ClojureScript compile and test commands, since the ClojureScript compiler is at least partially written in the Java version of Clojure.

Java

Install one or more of the pre-built binaries from AdoptOpenJDK, or several other providers of Java binaries.

Additional methods:

  • Ubuntu 18.04 Linux: sudo apt-get install default-jre

Maven

For any mvn command you must install Maven.

  • Ubuntu 18.04 Linux: sudo apt-get install maven
  • macOS
    • plus Homebrew: brew install maven
    • plus MacPorts: sudo port install maven3, then either use the command mvn3, or to use mvn also run the command sudo port select --set maven maven3.

Leiningen

An install script and instructions are available on the Leiningen site.

Node.js JavaScript run time environment

Installation instructions for many different versions of Node.js are available on the Node.js web site. You can also install it using the commands below.

  • Ubuntu 18.04 Linux: sudo apt-get install nodejs
  • macOS
    • plus Homebrew: brew install node
    • plus MacPorts: sudo port install nodejs10. You can see other versions available via the command port list | grep nodejs.

SpiderMonkey JavaScript run time environment

Installation instructions for many different versions of SpiderMonkey are available on the SpiderMonkey web site. You may also install it using the commands below.

  • Ubuntu 18.04 Linux: sudo apt-get install libmozjs-52-dev
  • macOS
    • plus Homebrew: As of 2019-Sep-24, brew install spidermonkey installs version 1.8.5 of SpiderMonkey, which according to the Wikipedia page on SpiderMonkey was first released in 2011, with at least one release per year after that. The ClojureScript tests fail to run using this version of SpiderMonkey. It seems worth avoiding this version of SpiderMonkey for the purposes of testing core.rrb-vector.
    • plus MacPorts: sudo port install mozjs52

Clojure(Script) code reuse

core.rrb-vector's vectors support the same basic functionality regular Clojure's vectors do (with the omissions listed above). Where possible, this is achieved by reusing code from Clojure's gvec and ClojureScript's PersistentVector implementations. The Clojure(Script) source files containing the relevant code carry the following copyright notice:

Copyright (c) Rich Hickey. All rights reserved.
The use and distribution terms for this software are covered by the
Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
which can be found in the file epl-v10.html at the root of this distribution.
By using this software in any fashion, you are agreeing to be bound by
  the terms of this license.
You must not remove this notice, or any other, from this software.

Licence

Copyright © 2012-2023 Michał Marczyk, Andy Fingerhut, Rich Hickey and contributors

Distributed under the Eclipse Public License 1.0, the same as Clojure. The licence text can be found in the epl-v10.html file at the root of this distribution.

More Repositories

1

clojure

The Clojure programming language
Java
10,334
star
2

clojurescript

Clojure to JS compiler
Clojure
9,191
star
3

core.async

Facilities for async programming and communication in Clojure
Clojure
1,935
star
4

clojure-clr

A port of Clojure to the CLR, part of the Clojure project
C#
1,541
star
5

core.logic

A logic programming library for Clojure & ClojureScript
Clojure
1,434
star
6

core.typed

An optional type system for Clojure
Clojure
1,285
star
7

core.match

An optimized pattern matching library for Clojure
Clojure
1,180
star
8

test.check

QuickCheck for Clojure
Clojure
1,112
star
9

java.jdbc

JDBC from Clojure (formerly clojure.contrib.sql)
Clojure
714
star
10

tools.cli

Command-line processing
Clojure
711
star
11

tools.nrepl

A Clojure network REPL that provides a server and client, along with some common APIs of use to IDEs and other tools that may need to evaluate Clojure code in remote environments.
Clojure
661
star
12

tools.namespace

Tools for managing namespaces in Clojure
Clojure
596
star
13

data.json

JSON in Clojure
Clojure
536
star
14

algo.monads

Macros for defining monads, and definition of the most common monads
Clojure
444
star
15

core.cache

A caching library for Clojure implementing various cache strategies
Clojure
442
star
16

tools.deps.alpha

A functional API for transitive dependency graph expansion and the creation of classpaths
Clojure
435
star
17

tools.logging

Clojure logging API
Clojure
382
star
18

tools.trace

1.3 update of clojure.contrib.trace
Clojure
354
star
19

math.combinatorics

Efficient, functional algorithms for generating lazy sequences for common combinatorial functions
Clojure
343
star
20

spec-alpha2

Clojure library to describe the structure of data and functions
Clojure
297
star
21

data.csv

CSV reader/writer to/from Clojure data structures
Clojure
270
star
22

core.memoize

A manipulable, pluggable, memoization framework for Clojure
Clojure
263
star
23

tools.analyzer

An analyzer for Clojure code, written in Clojure and producing AST in EDN
Clojure
257
star
24

clojure-site

clojure.org site
HTML
249
star
25

data.xml

Clojure
220
star
26

data.finger-tree

Finger Tree data structure
Clojure
213
star
27

spec.alpha

Clojure library to describe the structure of data and functions
Clojure
212
star
28

tools.reader

Clojure reader in Clojure
Clojure
203
star
29

tools.build

Clojure builds as Clojure programs
Clojure
200
star
30

data.priority-map

Clojure priority map data structure
Clojure
186
star
31

math.numeric-tower

Math functions that deal intelligently with the various types in Clojure's numeric tower
Clojure
175
star
32

test.generative

Generative test runner
Clojure
161
star
33

core.unify

Unification library
Clojure
137
star
34

core.contracts

Contracts programming
Clojure
127
star
35

data.fressian

Read and write Fressian data from Clojure
Clojure
127
star
36

data.avl

Persistent sorted maps and sets with log-time rank queries
Clojure
125
star
37

data.int-map

A map optimized for integer keys
Java
124
star
38

core.incubator

Proving ground for proposed new core fns
Clojure
116
star
39

java.data

Functions for recursively converting Java beans to Clojure and vice versa
Clojure
114
star
40

tools.analyzer.jvm

Additional jvm-specific passes for tools.analyzer
Clojure
113
star
41

tools.macro

Utilities for macro writers
Clojure
113
star
42

clojurescript-site

website for ClojureScript
Shell
106
star
43

tools.deps.graph

Dependency graphs for deps.edn projects
Clojure
106
star
44

java.jmx

Produce and consume JMX beans from Clojure
Clojure
94
star
45

algo.generic

Generic versions of commonly used functions, implemented as multimethods that can be implemented for any data type
Clojure
92
star
46

tools.emitter.jvm

A JVM bytecode generator for ASTs compatible with tools.analyzer(.jvm)
Clojure
86
star
47

data.generators

Random data generators
Clojure
85
star
48

data.zip

Utilities for clojure.zip
Clojure
83
star
49

brew-install

Clojure CLI installer
Shell
81
star
50

data.codec

Native codec implementations
Clojure
74
star
51

tools.gitlibs

API for retrieving, caching, and programatically accessing git libraries
Clojure
62
star
52

java.classpath

Examine the Java classpath from Clojure programs
Clojure
59
star
53

jvm.tools.analyzer

Clojure
53
star
54

core.specs.alpha

specs to describe Clojure core macros and functions
Clojure
47
star
55

tools.tools

Clojure CLI tool for managing Clojure CLI tools
Clojure
42
star
56

homebrew-tools

Clojure homebrew tap providing Clojure formulae
Ruby
41
star
57

data.alpha.replicant-server

A Clojure library providing remote implementations of the Clojure data structures and a remote REPL server.
Clojure
37
star
58

test.benchmark

Benchmark and Regression Suite for Clojure
Roff
37
star
59

clr.tools.nrepl

Clojure
25
star
60

build.ci

Support scripts for continuous integration
Clojure
23
star
61

tools.analyzer.js

Provides js-specific passes for tools.analyzer
Clojure
21
star
62

algo.graph

Basic graph theory algorithms
Clojure
16
star
63

clojure-install

Java
16
star
64

data.alpha.replicant-client

A Clojure library providing client-side implementations of Clojure datastructures served by replicant-server.
Clojure
13
star
65

clojure.github.com

Documentation repos
HTML
8
star
66

build.poms

Parent POMs
8
star
67

core.typed.analyzer.jvm

Clojure
7
star
68

clr.tools.namespace

Clojure
7
star
69

core.typed.runtime.jvm

Clojure
7
star
70

clr.data.json

JSON in Clojure on the CLR
Clojure
6
star
71

clr.tools.reader

Clojure
5
star
72

clr.test.generative

Clojure
5
star
73

clojure-api-doc

Clojure API doc build
Clojure
5
star
74

contrib-api-doc

Clojure contrib API doc build
Clojure
5
star
75

core.typed.annotator.jvm

Clojure
5
star
76

core.typed.checker.jvm

Clojure
4
star
77

core.typed.checker.js

Clojure
4
star
78

io.incubator

Proving ground for proposed new io fns
4
star
79

clr.data.generators

Random data generators for Clojure on the CLR
Clojure
4
star
80

clr.core.async

Port of Clojure core.async to the CLR
Clojure
3
star
81

clr.spec.alpha

spec on the CLR
Clojure
3
star
82

clr.tools.analyzer

Clojure
3
star
83

test.regression

Regression tests for Clojure
Clojure
3
star
84

tools.deps.cli

Deps functions
Clojure
2
star
85

clr.core.specs.alpha

core specs on CLR
HTML
2
star
86

java.internal.invoke

2
star
87

clr.tools.gitlibs

An API for retrieving, caching, and programatically accessing git libraries
HTML
2
star
88

clr.core.logic

Clojure
2
star
89

clr.tools.trace

1
star
90

clr.core.cli

Clojure
1
star
91

clr.data.priority-map

ClojureCLR port of data.priority-map
Clojure
1
star
92

cljs.tools.closure

ClojureScript build of Google Closure
Shell
1
star
93

tools.analyzer.clr

additional clr-specific passes for tools.analyzer
Clojure
1
star
94

clr.test.check

Clojure
1
star
95

clr.core.cache

ClojureCLR port of core.cache
Clojure
1
star
96

clr.tools.logging

1
star
97

build.test

Dummy project for testing contrib build and deploy
Clojure
1
star
98

clr.core.memoize

ClojureCLR port of core.memoize
Clojure
1
star