• Stars
    star
    372
  • Rank 111,242 (Top 3 %)
  • Language
    Julia
  • License
    Other
  • Created over 8 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

High performance nearest neighbor data structures and algorithms for Julia.

NearestNeighbors.jl

Build Status codecov.io DOI

NearestNeighbors.jl is a package written in Julia to perform nearest neighbor searches.


Creating a tree

There are currently three types of trees available:

  • BruteTree: Not actually a tree. It linearly searches all points in a brute force fashion. Works with any Metric.
  • KDTree: In a kd tree the points are recursively split into groups using hyper-planes. Therefore a KDTree only works with axis aligned metrics which are: Euclidean, Chebyshev, Minkowski and Cityblock.
  • BallTree: Points are recursively split into groups bounded by hyper-spheres. Works with any Metric.

These trees are created with the following syntax:

BruteTree(data, metric; leafsize, reorder)
KDTree(data, metric; leafsize, reorder)
BallTree(data, metric; leafsize, reorder)
  • data: The data, i.e., the points to build up the tree from. It can either be
    • a matrix of size nd × np with the points to insert in the tree where nd is the dimensionality of the points and np is the number of points
    • a vector of vectors with fixed dimensionality, nd, which must be part of the type. Specifically, data should be a Vector{V}, where V is itself a subtype of an AbstractVector and such that eltype(V) and length(V) are defined. (For example, with 3D points, V = SVector{3, Float64} works because eltype(V) = Float64 and length(V) = 3 are defined in V.)
  • metric: The metric to use, defaults to Euclidean. This is one of the Metric types defined in the Distances.jl packages. It is possible to define your own metrics by simply creating new types that are subtypes of Metric.
  • leafsize (keyword argument): Determines at what number of points to stop splitting the tree further. There is a trade-off between traversing the tree and having to evaluate the metric function for increasing number of points.
  • reorder (keyword argument): While building the tree this will put points close in distance close in memory since this helps with cache locality. In this case, a copy of the original data will be made so that the original data is left unmodified. This can have a significant impact on performance and is by default set to true.

All trees in NearestNeighbors.jl are static which means that points can not be added or removed from an already created tree.

Here are a few examples of creating trees:

using NearestNeighbors
data = rand(3, 10^4)

# Create trees
kdtree = KDTree(data; leafsize = 10)
balltree = BallTree(data, Minkowski(3.5); reorder = false)
brutetree = BruteTree(data)

k Nearest Neighbor (kNN) searches

A kNN search finds the k nearest neighbors to given point(s). This is done with the method:

knn(tree, points, k, sortres = false, skip = always_false) -> idxs, dists
  • tree: The tree instance
  • points: A vector or matrix of points to find the k nearest neighbors to. If points is a vector of numbers then this represents a single point, if points is a matrix then the k nearest neighbors to each point (column) will be computed. points can also be a vector of other vectors where each element in the outer vector is considered a point.
  • sortres (optional): Determines if the results should be sorted before returning. In this case the results will be sorted in order of increasing distance to the point.
  • skip (optional): A predicate to determine if a given point should be skipped, for example if iterating over points and a point has already been visited.

It is generally better for performance to query once with a large number of points than to query multiple times with one point per query.

As a convenience, if you only want the closest nearest neighbor, you can call nn instead for a cleaner result:

nn(tree, points, skip = always_false) -> idxs, dists

Some examples:

using NearestNeighbors
data = rand(3, 10^4)
k = 3
point = rand(3)

kdtree = KDTree(data)
idxs, dists = knn(kdtree, point, k, true)

idxs
# 3-element Array{Int64,1}:
#  4683
#  6119
#  3278

dists
# 3-element Array{Float64,1}:
#  0.039032201026256215
#  0.04134193711411951
#  0.042974090446474184

# Multiple points
points = rand(3, 4);

idxs, dists = knn(kdtree, points, k, true);

idxs
# 4-element Array{Array{Int64,1},1}:
#  [3330, 4072, 2696]
#  [1825, 7799, 8358]
#  [3497, 2169, 3737]
#  [1845, 9796, 2908]

# dists
# 4-element Array{Array{Float64,1},1}:
#  [0.0298932, 0.0327349, 0.0365979]
#  [0.0348751, 0.0498355, 0.0506802]
#  [0.0318547, 0.037291, 0.0421208]
#  [0.03321, 0.0360935, 0.0411951]

# Static vectors
v = @SVector[0.5, 0.3, 0.2];

idxs, dists = knn(kdtree, v, k, true);

idxs
# 3-element Array{Int64,1}:
#   842
#  3075
#  3046

dists
# 3-element Array{Float64,1}:
#  0.04178677766255837
#  0.04556078331418939
#  0.049967238112417205

Range searches

A range search finds all neighbors within the range r of given point(s). This is done with the method:

inrange(tree, points, r, sortres = false) -> idxs

Note that for performance reasons the distances are not returned. The arguments to inrange are the same as for knn except that sortres just sorts the returned index vector.

An example:

using NearestNeighbors
data = rand(3, 10^4)
r = 0.05
point = rand(3)

balltree = BallTree(data)
idxs = inrange(balltree, point, r, true)

# 4-element Array{Int64,1}:
#  317
#  983
# 4577
# 8675

neighborscount = inrangecount(balltree, point, r, true) # if you were just interested in the number of points, this function will count them without allocating arrays for the indexes

Using on-disk data sets

By default, the trees store a copy of the data provided during construction. This is problematic in case you want to work on data sets which are larger than available memory, thus wanting to mmap the data or want to store the data in a different place, outside the tree.

DataFreeTree can be used to strip a constructed tree of its data field and re-link it with that data at a later stage. An example of using a large on-disk data set looks like this:

using Mmap
ndim = 2; ndata = 10_000_000_000
data = Mmap.mmap(datafilename, Matrix{Float32}, (ndim, ndata))
data[:] = rand(Float32, ndim, ndata)  # create example data
dftree = DataFreeTree(KDTree, data)

dftree now only stores the indexing data structures. It can be passed around, saved and reloaded independently of data.

To perform look-ups, dftree is re-linked to the underlying data:

tree = injectdata(dftree, data)  # yields a KDTree
knn(tree, data[:,1], 3)  # perform operations as usual

Author

Kristoffer Carlsson - @KristofferC - [email protected]

More Repositories

1

OhMyREPL.jl

Syntax highlighting and other enhancements for the Julia REPL
Julia
692
star
2

TimerOutputs.jl

Formatted output of timed sections in Julia
Julia
581
star
3

PGFPlotsX.jl

Plots in Julia using the PGFPlots LaTeX package
Julia
296
star
4

Crayons.jl

Colored and styled strings for terminals.
Julia
134
star
5

LazilyInitializedFields.jl

Are your fields sleeping?... zzzz...
CSS
34
star
6

Phon

Insert cohesive elements between grains in microstructures
Python
27
star
7

QOI.jl

QOI (Quite OK Image) format decoder/encoder.
Julia
18
star
8

ContMechTensors.jl

Efficient computations with symmetric and non-symmetric tensors with support for automatic differentiation.
Julia
14
star
9

HyperHessians.jl

Julia
13
star
10

MMA.jl

MMA optimization algorithm in Julia
Julia
12
star
11

RegistryCompatTools.jl

Is your compat up to date?
Julia
11
star
12

FeynSimul

Path Integral simulation with OpenCL bindings to Python
Python
10
star
13

SIMDVectors.jl

Julia
8
star
14

EnvironmentGraph.jl

Julia
7
star
15

SIMDIntrinsics.jl

Julia
6
star
16

IOIndents.jl

Simple writing of indented and aligned text
Julia
6
star
17

PGFPlotsXExamples

Julia
6
star
18

FEM.jl

Finite element analysis in Julia
Julia
5
star
19

BlockSparseMatrices.jl

Blocked Sparse Matrices in Julia
Julia
5
star
20

UpdateIndex.jl

Julia
5
star
21

lolFem

lolFem
Python
5
star
22

Notcurses.jl

Julia
5
star
23

Backporter

Script to make backport branches to Julia
Julia
4
star
24

ConstLab.jl

Constitutive modeling in Julia
Julia
4
star
25

PkgFsck.jl

*Teleports Behind You* Nothing Personal, Kid
Julia
4
star
26

JuliaCon_FEM

Slides for JuliaCon presentation about FEM in Julia
3
star
27

TerminalSpinners.jl

You spin my cursor right round right round
Julia
3
star
28

JuAFEM.jl

Finite element toolbox for Julia
Julia
3
star
29

PkgEvalAnalysis

Jupyter Notebook
3
star
30

Expect.jl

std::expected in Julia
Julia
2
star
31

Magyar.jl

Julia
2
star
32

TestPackage.jl

Julia
1
star
33

DigiWellSeminarieJulia

Files for a presentation for DigiWell Norway seminarie day
Julia
1
star
34

Clang_jll.jl

Julia
1
star
35

MainRepo

Julia
1
star
36

resnet

Python
1
star
37

MKL_jll.jl

Julia
1
star
38

RegistratorTest.jl

Julia
1
star
39

SubProjectsExample

Julia
1
star
40

CcallableRecorder.jl

Julia
1
star
41

MyPackage

Julia
1
star
42

Kaleidoscope.jl

Julia
1
star
43

Currencies.jl

Julia
1
star
44

ViscoCrystalPlast

Code for phd project
Julia
1
star
45

LightStructArrays.jl

Julia
1
star