• Stars
    star
    119
  • Rank 296,175 (Top 6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 8 years ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Cloud-native, service based access to HDF data

HSDS (Highly Scalable Data Service) - REST-based service for HDF5 data

Introduction

HSDS is a web service that implements a REST-based web service for HDF5 data stores. Data can be stored in either a POSIX files system, or using object-based storage such as AWS S3, Azure Blob Storage, or MinIO. HSDS can be run a single machine using Docker or on a cluster using Kubernetes (or AKS on Microsoft Azure).

In addition, HSDS can be run in serverless mode with AWS Lambda or h5pyd local mode.

Quick Start

Make sure you have Python 3, Pip, and git installed, then:

  1. Clone this repo: $ git clone https://github.com/HDFGroup/hsds
  2. Go to the hsds directory: $ cd hsds
  3. Run install: $ python setup.py install OR install from pypi: $ pip install hsds
  4. Setup password file: $ cp admin/config/passwd.default admin/config/passwd.txt
  5. Create a directory the server will use to store data, and then set the ROOT_DIR environment variable to point to it: $ mkdir hsds_data; export ROOT_DIR="${PWD}/hsds_data" For Windows: C:> set ROOT_DIR=%CD%\hsds_data
  6. Create the hsds test bucket: $ mkdir hsds_data/hsdstest
  7. Start server: $ ./runall.sh --no-docker For Windows: C:> runall.bat
  8. In a new shell, set the environment variable HSDS_ENDPOINT to the string displayed. E.g.: $ export HSDS_ENDPOINT=http+unix://%2Ftmp%2Fhs%2Fsn_1.sock. For Windows, this step can be skipped
  9. Set environment variables for the admin account: $ export ADMIN_USERNAME=admin and $ export ADMIN_PASSWORD=admin (adjust for any changes made to the passwd.txt file). For Windows - use the corresponding set commands
  10. Run the test suite: $ python testall.py
  11. (Optional) Post install setup (test data, home folders, cli tools, etc): docs/post_install.md
  12. (Optional) Install the h5pyd package for an h5py compatible api and tool suite: https://github.com/HDFGroup/h5pyd

To shut down the server, and the server was started with the --no-docker option, just control-C.

If using docker, run: $ ./stopall.sh

Note: passwords can (and should for production use) be modified by changing values in hsds/admin/config/password.txt and rebuilding the docker image. Alternatively, an external identity provider such as Azure Active Directory or KeyCloak can be used. See: docs/azure_ad_setup.md for Azure AD setup instructions or docs/keycloak_setup.md for KeyCloak.

Detailed Install Instructions

On AWS

For complete instructions to install on a single Azure VM with Docker:

For complete instructions to install on AWS Kubernetes Service (EKS):

For complete instructions to install on AWS Lambda:

On Azure

For complete instructions to install on a single Azure VM with Docker:

For complete instructions to install on Azure Kubernetes Service (AKS):

On Prem (POSIX-based storage)

For complete instructions to install on a desktop or local server:

On DCOS (BETA)

For complete instructions to install on DCOS:

General Install Topics

Setting up docker:

Post install setup and testing:

Authorization, ACLs, and Role Based Access Control (RBAC):

Running serverless with h5pyd:

Writing Client Applications

As a REST service, clients be developed using almost any programming language. The test programs under: hsds/test/integ illustrate some of the methods for performing different operations using Python and HSDS REST API (using the requests package).

The related project: https://github.com/HDFGroup/h5pyd provides a (mostly) h5py-compatible interface to the server for Python clients.

For C/C++ clients, the HDF REST VOL is a HDF5 library plugin that enables the HDF5 API to read and write data using HSDS. See: https://github.com/HDFGroup/vol-rest. Note: requires v1.12.0 or greater version of the HDF5 library.

Uninstalling

HSDS only modifies the storage location that it is configured to use, so to uninstall just remove source files, Docker images, and S3 bucket/Azure Container/directory files.

Reporting bugs (and general feedback)

Create new issues at http://github.com/HDFGroup/hsds/issues for any problems you find.

For general questions/feedback, please use the HSDS forum: https://forum.hdfgroup.org/c/hsds.

License

HSDS is licensed under an APACHE 2.0 license. See LICENSE in this directory.

Integration with JupyterHub

The HDF Group provides access to an HSDS instance that is integrated with JupyterLab: HDF Lab. HDF Lab is a hosted Jupyter environment with these features:

  • Connection to a HSDS instance
  • Dedicated Xeon core per user
  • 10 GB Posix Disk
  • 200 GB S3 storage for HDF data
  • Sample programs and data files

Sign up for HDF Lab here: https://www.hdfgroup.org/hdfkitalab/.

Azure Marketplace

VM Offer for Azure Marketplace. HSDS for Azure Marketplace provides an easy way to setup a Azure instance with HSDS. See: https://azuremarketplace.microsoft.com/en-us/marketplace/apps/thehdfgroup1616725197741.hsdsazurevm?tab=Overview for more information.

Websites

Other useful resources

HDF Group Blog Posts

External Blogs and Articles

Slide Decks

Videos

Papers

More Repositories

1

hdf5

Official HDF5® Library Repository
C
539
star
2

h5serv

Reference service implementation of the HDF5 REST API
Python
168
star
3

hdf-compass

Python-based viewer for HDF5 on other file formats
Python
130
star
4

h5pyd

h5py distributed - Python client library for HDF Rest API
Python
108
star
5

HDF.PInvoke

Raw HDF5 Power for .NET
C#
80
star
6

hdf5-json

Specification and tools for representing HDF5 in JSON
Python
73
star
7

hdfview

Java
40
star
8

hdf5-cffi

Common Lisp bindings for the HDF5 library using CFFI
Common Lisp
36
star
9

hermes

Extending the HDF5 library to support intelligent I/O buffering for deep memory and storage hierarchy systems
C++
31
star
10

vol-async

Asynchronous I/O for HDF5
C
19
star
11

build_hdf5

Scripts for building HDF5 for various platforms and compilers
Shell
16
star
12

hdf-docker

Dockerfiles for HDF related containers
Dockerfile
16
star
13

vol-cache

HDF5 Cache VOL connector for caching data on fast storage layers and moving data asynchronously to the parallel file system to hide I/O overhead.
C
16
star
14

hdf-rest-api

Python
12
star
15

hdf5-iotest

HDF5 Performance Analysis Checklist
Jupyter Notebook
12
star
16

hdf4

Official HDF4 Library Repository
C
12
star
17

h5ld

Python reader for Linked Data in HDF5 files
Python
11
star
18

PyHexad

A PyXLL-based Excel add-in for HDF5
Python
9
star
19

hdf5_plugins

CMake
8
star
20

datacontainer

Data Container Study
Shell
8
star
21

hsds_examples

Jupyter Notebook
8
star
22

psh5x

A Windows PowerShell HDF5 Extension
C++
6
star
23

HDF.PInvoke.1.10

HDF.PInvoke for .NET Standard
C#
6
star
24

hdf5-tutorial

A tutorial for new and intermediate HDF5 users (of all ages)
Jupyter Notebook
6
star
25

hdflab_examples

Python Notebook examples for HDFLab
Jupyter Notebook
6
star
26

emacs

Supporting HDF5 in the world's most powerful productivity environment
YASnippet
5
star
27

vol-rest

HDF5 REST VOL Connector
C
5
star
28

hcl

Hermes Container Library
C++
5
star
29

vol-daos

HDF5 VOL connector for DAOS
C
5
star
30

hdf5-poker

Everything you've ever wanted to know about the HDF5 file format (but didn't dare ask).
5
star
31

hdf5-examples

C
4
star
32

dynamic-dns

A dynamic DNS server for mapping files to DNS entries
Python
4
star
33

Tutorial

C
4
star
34

vol-tests

C
4
star
35

hdfcloud_workshop

Jupyter Notebook
3
star
36

hdf5-api-ref

HDF5: API Specification Reference Manual
Python
3
star
37

blog

A repository for material related to postings on the HDF Blog (http://blog.hdfgroup.org/)
Python
3
star
38

Recorder

A Multi-Level Library for Understanding I/O Activity in HPC Applications
C
3
star
39

vol-log-based

Log VOL - an HDF5 VOL connector for storing data in a time-log layout in files
C++
3
star
40

hdf5-spark-connector

HDF5 Connector for Apache Spark
Scala
3
star
41

tar2h5

Convert Tape ARchives to HDF5 files
C
2
star
42

nasa_cloud

Python
2
star
43

alcove

Everything that you have always wanted to know about the HDF5 file format, but were ashamed to ask.
Jupyter Notebook
2
star
44

hdf5doc

HTML
2
star
45

hsds-bucket-loader

Python
2
star
46

armed-hdf5

HDF5 on ARM architectures
1
star
47

hdf5vfd4hdfs-demo

A demonstration of the HDF5 Virtual File Driver for HDFS
HTML
1
star
48

Replayer

Automatic Generation of I/O Trace Generators for HPC Applications
C++
1
star
49

aiohstools

Async tools for HDF Server
Python
1
star
50

hdf-spack

Python
1
star
51

cve_hdf5

For testing CVE issues filed against the HDF5 library
Shell
1
star