• This repository has been archived on 30/Aug/2023
  • Stars
    star
    5
  • Rank 2,774,163 (Top 57 %)
  • Language
    Julia
  • Created almost 10 years ago
  • Updated almost 10 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Julia package for SciDB

SciDB-Julia

##Introduction

The SciDB-Julia package allows users of Julia to interface with SciDB. The API follows the Julia convention and allows for using Julia language constructs in SciDB operations. Requirements

Julia v0.3 (​http://julialang.org/)
Julia package "Requests.jl" and its dependencies (​https://github.com/Keno/Requests.jl)
SciDB 14.7
Paradigm4 shim for REST-ful access to SciDB (​https://github.com/Paradigm4/shim)

##Installation and Usage Please follow the instructions at ​https://github.com/Paradigm4/shim to get the Paradigm4 shim properly installed and running.

Please follow the instructions at ​http://julialang.org/ to get Julia properly installed and running.

Once Julia is running, please install the package "Requests.jl" from within Julia via the following:

julia> Pkg.add("Requests")

To download the source code for the 14.7 release, change 'branch:master' to 'tag:v14.7' and then click 'download', or directly download from https://github.com/Paradigm4/SciDB-Julia/archive/14.7.zip

To load the SciDB-Julia package, ensure that the Julia LOAD_PATH variable points towards the folder in which the package lives. NOTE: this step is not required unless the package lives in a non-standard location.

julia> push!(LOAD_PATH, "/path/to/SciDB-Julia")
3-element Array{Union(UTF8String,ASCIIString),1}:
 "/ssd/julia/usr/local/share/julia/site/v0.3"
 "/ssd/julia/usr/share/julia/site/v0.3"      
 "/path/to/SciDB-Julia"

The Paradigm4 shim must be up-and-running, according to the instructions at the link given above, before proceeding. Once the shim has started, proceed to load the SciDB-Julia package within the Julia environment (there is a small load time as the dependencies themselves load):

julia> using SciDBJulia

###Example 1 - Multiplying Two Dense SciDB Matrices

Create a Julia array and then use it to create a matching SciDB array. The first invocation of scidb takes a small amount of time to create network connections to the SciDB database. Subsequent invocations should take no noticeable amount of time.

julia> a=rand(3,4)
3x4 Array{Float64,2}:
 0.480573  0.684064  0.337876  0.48843 
 0.74014   0.55052   0.344965  0.504495
 0.191254  0.627969  0.341594  0.878666

julia> A=scidb(a)
scidb_array("Julia_15993245957846797056_12",3x4 Array{Float64,2}:
 0.480573  0.684064  0.337876  0.48843 
 0.74014   0.55052   0.344965  0.504495
 0.191254  0.627969  0.341594  0.878666,32,false)

A is a handle to the SciDB corresponding SciDB array. It contains a and a pointer to the same array on SciDB. Let's continue to make another array b, and its handle for a corresponding array on SciDB, B.

julia> b=rand(4,3)
4x3 Array{Float64,2}:
 0.959354  0.305615   0.375201
 0.600768  0.313162   0.69977 
 0.736307  0.883326   0.90877 
 0.936542  0.0650287  0.958536

julia> B=scidb(b)
scidb_array("Julia_6140101800367326939_12",4x3 Array{Float64,2}:
 0.959354  0.305615   0.375201
 0.600768  0.313162   0.69977 
 0.736307  0.883326   0.90877 
 0.936542  0.0650287  0.958536,32,false)

Let's multiply A and B on SciDB. The result will be stored on SciDB and a handle to that result will be returned to us in Julia. The handle will not itself contain the result data until we explicitly materialize it. This enables us to keep datasets on SciDB only.

julia> C=A*B
scidb_array("Julia_14736187500044874775_9",3x3 sparse matrix with 0 Float64 entries:,32,false)

Now, let's pull the result back from SciDB into Julia:

julia> c=julia(C)
3x3 sparse matrix with 9 Float64 entries:
	[1, 1]  =  1.57822
	[2, 1]  =  1.76727
	[3, 1]  =  1.63517
	[1, 2]  =  0.69131
	[2, 2]  =  0.736123
	[3, 2]  =  0.613984
	[1, 3]  =  1.43423
	[2, 3]  =  1.46001
	[3, 3]  =  1.66386

Arrays created in SciDB from Julia are persistent and must be explicitly removed. This is done by the remove function:

julia> remove(A)

julia> remove(B)

julia> remove(C)

###Example 2 - Multiplying Two Sparse Matrices, one from Julia, one from SciDB

julia> a=sprand(3,4,0.40)
3x4 sparse matrix with 3 Float64 entries:
	[2, 1]  =  0.748581
	[1, 2]  =  0.735042
	[3, 4]  =  0.488018

julia> A=scidb(a)
scidb_array("Julia_2373004205611115214_12",3x4 sparse matrix with 3 Float64 entries:
	[2, 1]  =  0.748581
	[1, 2]  =  0.735042
	[3, 4]  =  0.488018,32,true)

julia> b=sprand(4,3,0.40)
4x3 sparse matrix with 6 Float64 entries:
	[3, 1]  =  0.577354
	[1, 2]  =  0.610267
	[4, 2]  =  0.586419
	[1, 3]  =  0.760892
	[2, 3]  =  0.499974
	[3, 3]  =  0.385498

julia> C=A*b
scidb_array("Julia_13845036751259206723_9",3x3 sparse matrix with 0 Float64 entries:,32,true)

julia> c=julia(C)
3x3 sparse matrix with 4 Float64 entries:
	[2, 2]  =  0.456835
	[3, 2]  =  0.286183
	[1, 3]  =  0.367502
	[2, 3]  =  0.56959

##API

function scidb(J::AbstractArray, chunkSize=32, densityOverride="none")
  # Create a SciDB array from a Julia array
  # TODO: Support for mxn matrices only at the moment (m,n>=1); same limitation as SciDB-R.

function remove(J::scidb_array)
  # Remove the array from SciDB.

function * (J::scidb_array, H::scidb_array)
function * (J::scidb_array, H::AbstractArray)
function * (J::AbstractArray, H::scidb_array)
  # GEMM and SPGEMM matrix mulitplication methods, supporting:
     scidb_array * scidb_array
     scidb_array * AbstractArray
     AbstractArray * scidb_array

function julia(S::scidb_array)
  # Populate a Julia array from a SciDB array which was previously created by a SciDB query from Julia

##Limitations 1. The plugin must be run on the same machine as the shim server and at port 8080. 2. Only 2-dimensional matrices are supported. 3. Load the dense_linear_algebra and linear_algebra libraries via iquery before running SciDBJulia.

More Repositories

1

SciDB-Py

Python wrapper for SciDB queries
Python
112
star
2

SciDBR

R package for SciDB
R
52
star
3

shim

HTTP service for SciDB http://paradigm4.github.io/shim/help.html
JavaScript
23
star
4

SciDB_legacy_issues

Public issue tracking system for SciDB
22
star
5

stream

Prototype Hadoop streaming-like SciDB API
C++
11
star
6

orderbook-example

Create an example order book from ARCA sample financial market data
C++
7
star
7

accelerated_io_tools

Prototype SciDB plugins for faster ingest and export of data.
C++
6
star
8

superfunpack

Miscellaneous scalar functions for SciDB that are super fun.
C++
5
star
9

scidb.dplyr

A SciDB backend for dplyr
R
5
star
10

ArrayOpR

R package for object-oriented scidb operations/operands
R
5
star
11

equi_join

Relational-style Equi-Join of SciDB Arrays by Attributes or Dimensions
C++
4
star
12

r_exec

Run R programs within SciDB queries
C++
4
star
13

bridge

SciDB input/output using external storage
Python
4
star
14

scripts

Miscellaneous SciDB scripts and utilities
Shell
4
star
15

finance

DEPRECATED - Basic financial time-series SciDB examples
3
star
16

dev_tools

Tools for installing SciDB plugins from GitHub
C++
3
star
17

extra-scidb-libs

Shell
3
star
18

quotes

An OPRA options nbbo/book consolidation example
Shell
3
star
19

knn

Basic k nearest neighbor operator
C++
2
star
20

revealcore

R package for core functionality for various REVEAL applications
R
2
star
21

grouped_aggregate

Prototype grouped aggregation in SciDB
C++
2
star
22

wearable_prototypes

SciDB examples on a sample wearable timeseries dataset
Jupyter Notebook
1
star
23

scidb-tornado

SciDB RESTful api via tornado
Python
1
star
24

chunk_unique

DEPRECATED -- Filter out repeated string values within each chunk of an array.
C++
1
star
25

summarize

Quickly compute array size statistics
C++
1
star
26

saltstack-scidb-cluster

saltstack state and template files for installing and maintaining a scidb cluster
Python
1
star
27

collate

Map muti-attribute 1-d arrays into matrices
C++
1
star
28

pkgdown

Forked R pkgdown library
R
1
star
29

load_tools

DEPRECATED - Tools for efficient and error-tolerant loading of data into SciDB
1
star
30

cluster_tools

Tools and notes for AWS and EC2
Shell
1
star