NamedArrays
Julia type that implements a drop-in wrapper for AbstractArray
type, providing named indices and dimensions.
Idea
We would want to have the possibility to give each row/column/... in an Array names, as well as the array dimensions themselves. This could be used for pretty-printing, indexing, and perhaps even some sort of dimension-checking in certain matrix computations.
In all other respects, a NamedArray
should behave the same as the underlying AbstractArray
.
A NamedArray
should adhere to the interface definition of an AbstractArray
itself, if there are cases where this is not true, these should be considered bugs in the implementation of NamedArrays
.
Synopsis
julia> using NamedArrays
julia> n = NamedArray(rand(2,4))
2Γ4 Named Matrix{Float64}
A β² B β 1 2 3 4
βββββββΌβββββββββββββββββββββββββββββββββββββββββββ
1 β 0.640719 0.996256 0.534355 0.610259
2 β 0.67784 0.281928 0.0112326 0.672123
julia> setnames!(n, ["one", "two"], 1) # give the names "one" and "two" to the rows (dimension 1)
(OrderedCollections.OrderedDict{Any, Int64}("one" => 1, "two" => 2), OrderedCollections.OrderedDict{Any, Int64}("1" => 1, "2" => 2, "3" => 3, "4" => 4))
julia> n["one", 2:3]
2-element Named Vector{Float64}
B β
ββββΌβββββββββ
2 β 0.996256
3 β 0.534355
julia> n["two", :] = 11:14
11:14
julia> n[Not("two"), :] = 4:7 # all rows but the one called "two"
4:7
julia> n
2Γ4 Named Matrix{Float64}
A β² B β 1 2 3 4
βββββββΌβββββββββββββββββββββββ
one β 4.0 5.0 6.0 7.0
two β 11.0 12.0 13.0 14.0
julia> sum(n, dims=1)
1Γ4 Named Matrix{Float64}
A β² B β 1 2 3 4
ββββββββΌβββββββββββββββββββββββ
sum(A) β 15.0 17.0 19.0 21.0
Construction
Default names for indices and dimensions
julia> n = NamedArray([1 2; 3 4]) ## NamedArray(a::Array)
2Γ2 Named Matrix{Int64}
A β² B β 1 2
βββββββΌβββββ
1 β 1 2
2 β 3 4
n = NamedArray{Int}(2, 2) ## NamedArray{T}(dims...)
2Γ2 Named Matrix{Int64}
A β² B β 1 2
βββββββΌβββββββββββββββββββββββββββββββββββββ
1 β 33454 72058693549555969
2 β 72339073326448640 4318994440
These constructors add default names to the array of type String, "1"
,
"2"
, ... for each dimension, and names the dimensions :A
, :B
,
... (which will be all right for 26 dimensions to start with; 26
dimensions should be enough for anyone:-). The former initializes
the NamedArray with the Array a
, the latter makes an uninitialized
NamedArray of element type T
with the specified dimensions dims...
.
Lower level constructors
The key-lookup for names is implemented by using DataStructures.OrderedDict
s for each dimension. At a lower level, you can construct NamedArrays
this way:
julia> using DataStructures
julia> n = NamedArray([1 3; 2 4], ( OrderedDict("A"=>1, "B"=>2), OrderedDict("C"=>1, "D"=>2) ),
("Rows", "Cols"))
2Γ2 Named Matrix{Int64}
Rows β² Cols β C D
βββββββββββββΌβββββ
A β 1 3
B β 2 4
This is the basic constructor for a NamedArray. The second argument names
must be a tuple of OrderedDict
s whose range (the values) are exactly covering the range 1:size(a,dim)
for each dimension. The keys in the various dictionaries may be of mixed types, but after construction, the type of the names cannot be altered. The third argument dimnames
is a tuple of the names of the dimensions themselves, and these names may be of any type.
Vectors of names
# NamedArray{T,N}(a::AbstractArray{T,N}, names::NTuple{N,Vector}, dimnames::NTuple{N})
julia> n = NamedArray([1 3; 2 4], ( ["a", "b"], ["c", "d"] ), ("Rows", "Cols"))
2Γ2 Named Matrix{Int64}
Rows β² Cols β c d
βββββββββββββΌβββββ
a β 1 3
b β 2 4
# NamedArray{T,N}(a::AbstractArray{T,N}, names::NTuple{N,Vector})
julia> n = NamedArray([1 3; 2 4], ( ["a", "b"], ["c", "d"] ))
2Γ2 Named Matrix{Int64}
A β² B β c d
βββββββΌβββββ
a β 1 3
b β 2 4
julia> n = NamedArray([1, 2], ( ["a", "b"], )) # note the comma after ["a", "b"] to ensure evaluation as tuple
2-element Named Vector{Int64}
A β
ββββΌββ
a β 1
b β 2
# Names can also be set with keyword arguments
julia> n = NamedArray([1 3; 2 4]; names=( ["a", "b"], ["c", "d"] ), dimnames=("Rows", "Cols"))
2Γ2 Named Matrix{Int64}
Rows β² Cols β c d
βββββββββββββΌβββββ
a β 1 3
b β 2 4
This is a more friendly version of the basic constructor, where the range of the dictionaries is automatically assigned the values 1:size(a, dim)
for the names
in order. If dimnames
is not specified, the default values will be used (:A
, :B
, etc.).
In principle, there is no limit imposed to the type of the names
used, but we discourage the use of Real
, AbstractArray
and Range
, because they have a special interpretation in getindex()
and setindex
.
Indexing
Integer
indices
Single and multiple integer indices work as for the underlying array:
julia> n[1, 1]
1
julia> n[1]
1
Because the constructed NamedArray
itself is an AbstractArray
, integer indices always have precedence:
julia> a = rand(2, 4)
2Γ4 Matrix{Float64}:
0.272237 0.904488 0.847206 0.20988
0.533134 0.284041 0.370965 0.421939
julia> dodgy = NamedArray(a, ([2, 1], [10, 20, 30, 40]))
2Γ4 Named Matrix{Float64}
A β² B β 10 20 30 40
βββββββΌβββββββββββββββββββββββββββββββββββββββ
2 β 0.272237 0.904488 0.847206 0.20988
1 β 0.533134 0.284041 0.370965 0.421939
julia> dodgy[1, 1] == a[1, 1]
true
julia> dodgy[1, 10] ## BoundsError
ERROR: BoundsError: attempt to access 2Γ4 Matrix{Float64} at index [1, 10]
In some cases, e.g., with contingency tables, it would be very handy to be able to use named Integer indices. In this case, in order to circumvent the normal AbstractArray
interpretation of the index, you can wrap the indexing argument in the type Name()
julia> dodgy[Name(1), Name(30)] == a[2, 3] ## true
true
Named indices
julia> n = NamedArray([1 2 3; 4 5 6], (["one", "two"], [:a, :b, :c]))
2Γ3 Named Matrix{Int64}
A β² B β a b c
βββββββΌββββββββ
one β 1 2 3
two β 4 5 6
julia> n["one", :a] == 1
true
julia> n[:, :b] == [2, 5]
true
julia> n["two", [1, 3]] == [4, 6]
true
julia> n["one", [:a, :b]] == [1, 2]
true
This is the main use of NamedArrays
. Names (keys) and arrays of names can be specified as an index, and these can be mixed with other forms of indexing.
Slices
The example above just shows how the indexing works for the values, but there is a slight subtlety in how the return type of slices is determined
When a single element is selected by an index expression, a scalar value is returned. When an array slice is selected, an attempt is made to return a NamedArray with the correct names for the dimensions.
julia> n[:, :b] ## this expression drops the singleton dimensions, and hence the names
2-element Named Vector{Int64}
A β
βββββΌββ
one β 2
two β 5
julia> n[["one"], [:a]] ## this expression keeps the names
1Γ1 Named Matrix{Int64}
A β² B β a
βββββββΌββ
one β 1
Negation / complement
There is a special type constructor Not()
, whose function is to specify which elements to exclude from the array. This is similar to negative indices in the language R. The elements in Not(elements...)
select all but the indicated elements from the array.
julia> n[Not(1), :] == n[[2], :] ## note that `n` stays 2-dimensional
true
julia> n[2, Not(:a)] == n[2, [:b, :c]]
true
julia> dodgy[1, Not(Name(30))] == dodgy[1, [1, 2, 4]]
true
Both integers and names can be negated.
Dictionary-style indexing
You can also use a dictionary-style indexing, if you don't want to bother about the order of the dimensions, or make a slice using a specific named dimension:
julia> n[:A => "one"] == [1, 2, 3]
true
julia> n[:B => :c, :A => "two"] == 6
true
julia> n[:A=>:, :B=>:c] == [3, 6]
true
julia> n[:B=>[:a, :b]] == [1 2; 4 5]
true
julia> n[:A=>["one", "two"], :B=>:a] == [1, 4]
true
julia> n[:A=>[1, 2], :B=>:a] == [1, 4]
true
julia> n[:A=>["one"], :B=>1:2] == [1 2]
true
julia> n[:A=>["three"]] # Throws ArgumentError when trying to access non-existent dimension.
ERROR: ArgumentError: Elements for A => ["three"] not found.
Assignment
Most index types can be used for assignment as LHS
julia> n[1, 1] = 0
0
julia> n["one", :b] = 1
1
julia> n[:, :c] = 101:102
101:102
julia> n[:B=>:b, :A=>"two"] = 50
50
julia> n
2Γ3 Named Matrix{Int64}
A β² B β a b c
βββββββΌββββββββββββββ
one β 0 1 101
two β 4 50 102
General functions
Access to the names of the indices and dimensions
julia> names(n::NamedArray) ## get all index names for all dimensions
2-element Vector{Vector}:
["one", "two"]
[:a, :b, :c]
julia> names(n::NamedArray, 1) ## just for dimension `1`
2-element Vector{String}:
"one"
"two"
julia> dimnames(n::NamedArray) ## the names of the dimensions
2-element Vector{Symbol}:
:A
:B
Setting the names after construction
Because the type of the keys are encoded in the type of the NamedArray
, you can only change the names of indices if they have the same type as before.
setnames!(n::NamedArray, names::Vector, dim::Integer)
setnames!(n::NamedArray, name, dim::Int, index:Integer)
setdimnames!(n::NamedArray, name, dim:Integer)
sets all the names of dimension dim
, or only the name at index index
, or the name of the dimension dim
.
Enameration
Similar to the iterator enumerate
this package provides an enamerate
function for iterating simultaneously over both names and values.
enamerate(a::NamedArray)
For example:
julia> n = NamedArray([1 2 3; 4 5 6], (["one", "two"], [:a, :b, :c]))
2Γ3 Named Matrix{Int64}
A β² B β a b c
βββββββΌββββββββ
one β 1 2 3
two β 4 5 6
julia> for (name, val) in enamerate(n)
println("$name == $val")
end
("one", :a) == 1
("two", :a) == 4
("one", :b) == 2
("two", :b) == 5
("one", :c) == 3
("two", :c) == 6
Aggregating functions
Some functions, when operated on a NamedArray, will a name for the singleton index:
julia> sum(n, dims=1)
1Γ3 Named Matrix{Int64}
A β² B β a b c
ββββββββΌββββββββ
sum(A) β 5 7 9
julia> prod(n, dims=2)
2Γ1 Named Matrix{Int64}
A β² B β prod(B)
βββββββΌββββββββ
one β 6
two β 120
Aggregating functions are `sum`, `prod`, `maximum`, `minimum`, `mean`, `std`.
### Convert
```julia
convert(::Type{Array}, a::NamedArray)
converts a NamedArray to an Array by dropping all name information. You can also directly access the underlying array using n.array
, or use the accessor function array(n)
.
Methods with special treatment of names / dimnames
Concatenation
If the names are identical for the relevant dimension, these are retained in the results. Otherwise, the names are reinitialized to the default "1", "2", ...
In the concatenated direction, the names are always re-initialized. This may change is people find we should put more effort to check the concatenated names for uniqueness, and keep original names if that is the case.
julia> hcat(n, n)
2Γ6 Named Matrix{Int64}
A β² hcat β 1 2 3 4 5 6
ββββββββββΌβββββββββββββββββ
one β 1 2 3 1 2 3
two β 4 5 6 4 5 6
julia> vcat(n, n)
4Γ3 Named Matrix{Int64}
vcat β² B β a b c
ββββββββββΌββββββββ
1 β 1 2 3
2 β 4 5 6
3 β 1 2 3
4 β 4 5 6
Transposition
julia> n'
3Γ2 Named LinearAlgebra.Adjoint{Int64, Matrix{Int64}}
B β² A β one two
βββββββΌβββββββββ
a β 1 4
b β 2 5
c β 3 6
julia> circshift(n, (1, 2))
2Γ3 Named Matrix{Int64}
A β² B β b c a
βββββββΌββββββββ
two β 5 6 4
one β 2 3 1
Similar functions: adjoint
, transpose
, permutedims
operate on the dimnames as well.
julia> rotl90(n)
3Γ2 Named Matrix{Int64}
B β² A β one two
βββββββΌβββββββββ
c β 3 6
b β 2 5
a β 1 4
julia> rotr90(n)
3Γ2 Named Matrix{Int64}
B β² A β two one
βββββββΌβββββββββ
a β 4 1
b β 5 2
c β 6 3
Reordering of dimensions in NamedVectors
julia> v = NamedArray([1, 2, 3], ["a", "b", "c"])
3-element Named Vector{Int64}
A β
ββββΌββ
a β 1
b β 2
c β 3
julia> Combinatorics.nthperm(v, 4)
3-element Named Vector{Int64}
A β
ββββΌββ
b β 2
c β 3
a β 1
julia> Random.shuffle(v)
3-element Named Vector{Int64}
A β
ββββΌββ
b β 2
a β 1
c β 3
julia> reverse(v)
3-element Named Vector{Int64}
A β
ββββΌββ
c β 3
b β 2
a β 1
julia> sort(1 ./ v)
3-element Named Vector{Float64}
A β
ββββΌβββββββββ
c β 0.333333
b β 0.5
a β 1.0
operate on the names of the rows as well
Broadcasts
In broadcasting, the names of the first argument are kept
julia> ni = NamedArray(1 ./ n.array)
2Γ3 Named Matrix{Float64}
A β² B β 1 2 3
βββββββΌβββββββββββββββββββββββββββββ
1 β 1.0 0.5 0.333333
2 β 0.25 0.2 0.166667
julia> n .+ ni
β Warning: Using names of left argument
β @ NamedArrays ~/werk/julia/NamedArrays.jl/src/arithmetic.jl:25
2Γ3 Named Matrix{Float64}
A β² B β a b c
βββββββΌββββββββββββββββββββββββββ
one β 2.0 2.5 3.33333
two β 4.25 5.2 6.16667
julia> n .- v'
2Γ3 Named Matrix{Int64}
A β² B β a b c
βββββββΌββββββββ
one β 0 0 0
two β 3 3 3
This is implemented through broadcast
.
Further Development
The current goal is to reduce complexity of the implementation. Where possible, we want to use more of the Base.AbstractArray
implementation.
A longer term goal is to improve type stability, this might have a repercussion to the semantics of some operations.
Related Packages
The Julia ecosystem now has a number of packages implementing the general idea of attaching names to arrays. For some purposes they may be interchangeable. For others, flexibility or speed or support for particular functions may make one preferable.
-
AxisArrays.jl is of comparable age. It attaches a Symbol to each dimension; this is part of the type thus cannot be mutated after creation.
-
DimensionalData.jl, AxisKeys.jl and AxisIndices.jl are, to first approximation, post-Julia 1.0 re-writes of that. DimensionalData.jl similarly builds in dimension names, in AxisKeys.jl they are provided by composition with NamedDims.jl, and AxisIndices.jl does not support them. All allow some form of named indexing but the notation varies.
Packages with some overlap but a different focus include:
-
NamedDims.jl only attaches a name to each dimension, allowing
sum(A, dims = :time)
andA[time=3]
but keeping indices the usual integers. -
LabelledArrays.jl instead attaches names to individual elements, allowing
A.second == A[2]
. -
OffsetArrays.jl shifts the indices of an array, allowing say
A[-3]
toA[3]
to be the first & last elements.