• Stars
    star
    1,125
  • Rank 41,356 (Top 0.9 %)
  • Language
    Java
  • License
    GNU Affero Genera...
  • Created over 6 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Hopsworks - Data-Intensive AI platform with a Feature Store


What is Hopsworks?

Hopsworks is a data platform for ML with a Python-centric Feature Store and MLOps capabilities. Hopsworks is a modular platform. You can use it as a standalone Feature Store, you can use it to manage, govern, and serve your models, and you can even use it to develop and operate feature pipelines and training pipelines. Hopsworks brings collaboration for ML teams, providing a secure, governed platform for developing, managing, and sharing ML assets - features, models, training data, batch scoring data, logs, and more.


πŸš€ Quickstart

APP - Serverless (beta)

β†’ Go to app.hopsworks.ai

Hopsworks is available as a serverless app, simply head to app.hopsworks.ai and register with your Gmail or Github accounts. You will then be able to run a tutorial or access Hopsworks directly and try yourself. This is the prefered way to first experience the platform before diving into more advanced uses and installation requirements.

Azure, AWS & GCP

Managed Hopsworks is our platform for running Hopsworks and the Feature Store in the cloud and integrates directly with the customer AWS/Azure/GCP environment. It also integrates seamlessly with third party platforms such as Databricks, SageMaker and KubeFlow.

If you wish to run Hopsworks on your Azure, AWS or GCP environement, follow one of the following guides in our documentation:

Installer - On-premise

β†’ Follow the installation instructions.

The hopsworks-installer.sh script downloads, configures, and installs Hopsworks. It is typically run interactively, prompting the user about details of what is installed and where. It can also be run non-interactively (no user prompts) using the '-ni' switch.

Requirements

You need at least one server or virtual machine on which Hopsworks will be installed with at least the following specification:

  • Centos/RHEL 7.x or Ubuntu 18.04;
  • at least 32GB RAM,
  • at least 8 CPUs,
  • 100 GB of free hard-disk space,
  • outside Internet access (if this server is air-gapped, contact us for support),
  • a UNIX user account with sudo privileges.

πŸŽ“ Documentation and API

Documentation

Hopsworks documentation includes user guides, feature store documentation and an administration guide. We also include concepts to help user navigates the abstractions and logics of the feature stores and MLOps in general:

APIs

Hopsworks API documentation is divided in 3 categories; Hopsworks API covers project level APIs, Feature Store API covers covers feature groups, feature views and connectors, and finally MLOps API covers Model Registry, serving and deployment.

Tutorials

Most of the tutorials require you to have at least an account on app.hopsworks.ai. You can explore the dedicated https://github.com/logicalclocks/hopsworks-tutorials repository containing our tutorials or jump directly in one of the existing use cases:


πŸ“¦ Main Features

Project-based Multi-Tenancy and Team Collaboration

Hopsworks provides projects as a secure sandbox in which teams can collaborate and share ML assets. Hopsworks' unique multi-tenant project model even enables sensitive data to be stored in a shared cluster, while still providing fine-grained sharing capabilities for ML assets across project boundaries. Projects can be used to structure teams so that they have end-to-end responsibility from raw data to managed features and models. Projects can also be used to create development, staging, and production environments for data teams. All ML assets support versioning, lineage, and provenance provide all Hopsworks users with a complete view of the MLOps life cycle, from feature engineering through model serving.

Development and Operations

Hopsworks provides development tools for Data Science, including conda environments for Python, Jupyter notebooks, jobs, or even notebooks as jobs. You can build production pipelines with the bundled Airflow, and even run ML training pipelines with GPUs in notebooks on Airflow. You can train models on as many GPUs as are installed in a Hopsworks cluster and easily share them among users. You can also run Spark, Spark Streaming, or Flink programs on Hopsworks, with support for elastic workers in the cloud (add/remove workers dynamically).

Available on any Platform

Hopsworks is available as a both managed platform in the cloud on AWS, Azure, and GCP, and can be installed on any Linux-based virtual machines (Ubuntu/Redhat compatible), even in air-gapped data centers. Hopsworks is also available as a serverless platform that manages and serves both your features and models.

πŸ§‘β€πŸ€β€πŸ§‘ Community

Contribute

We are building the most complete and modular ML platform available in the market, and we count on your support to continuously improve Hopsworks. Feel free to give us suggestions, report bugs and add features to our library anytime.

Join the community

Open-Source

Hopsworks is available under the AGPL-V3 license. In plain English this means that you are free to use Hopsworks and even build paid services on it, but if you modify the source code, you should also release back your changes and any systems built around it as AGPL-V3.

More Repositories

1

rondb

This is RonDB, a distribution of NDB Cluster developed and used by Hopsworks AB. It also contains development branches of RonDB.
C++
574
star
2

hopsworks-tutorials

Tutorials for the Hopsworks Platform
Jupyter Notebook
236
star
3

hops-examples

Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Jupyter Notebook
117
star
4

maggy

Distribution transparent Machine Learning experiments on Apache Spark
Python
89
star
5

feature-store-api

Python - Java/Scala API for the Hopsworks feature store
Python
53
star
6

aml_end_to_end

AML End to End Example
Jupyter Notebook
50
star
7

hops-tensorflow

HopsYARN Tensorflow Framework.
Python
33
star
8

hops-util-py

Utility Library for Hopsworks. Issues can be posted at https://community.hopsworks.ai
Python
27
star
9

hopsworks-chef

Chef Cookbook for Hopsworks
Ruby
12
star
10

hops-docs

Documentation for Hopsworks and Hops
11
star
11

hopsworks-iot

Scala
8
star
12

hopsworks-api

Python SDK to interact with the Hopsworks API
Python
8
star
13

machine-learning-api

Hopsworks Machine Learning Api πŸš€ Model management with a model registry and model serving
Python
8
star
14

hops-hadoop-chef

Chef cookbook for Hops Hadoop
Ruby
7
star
15

hops-util

Utility Library for Hopsworks
Java
6
star
16

ndb-chef

Chef cookbook for MySQL Cluster (NDB)
Ruby
6
star
17

karamel-chef

This chef cookbook installs Karamel. Used by Vagrant to provision multi-node clusters.
Shell
6
star
18

flink-chef

Chef cookbook for Apache Flink.
Ruby
5
star
19

zeppelin-chef

Cookbook for installing Zeppelin/Spark
HTML
4
star
20

terraform-provider-hopsworksai

Hopsworks.ai Terraform provider
Go
4
star
21

hopslog-chef

Karamelized wrapper cookbook for installing Kibana and Logstash to work with Hopsworks.
Ruby
4
star
22

quartz

Logical Clocks Design System β€”Β NPM Package
TypeScript
3
star
23

spark-chef

Apache Spark chef cookbook
Ruby
3
star
24

ePipe

ePipe is a metadata system for HopsFS that provides replicated-metadata-as-a-service.
C++
3
star
25

sysbench-0.4.12

Sysbench tree for benchmarking iRoNDB
Shell
3
star
26

dr-elephant-chef

Chef cookbook to install Dr Elephant for Hadoop.
HTML
3
star
27

terraform-hopsworksai-helpers

Terraform module that creates the required cloud resources for Hopsworks.ai clusters on AWS and AZURE.
HCL
3
star
28

hops-kafka-authorizer

Kafka Authorization Engine for Hopsworks
Java
3
star
29

logicalclocks.github.io

Hopsworks documentation
2
star
30

kzookeeper

Karamelized, wrapper Chef cookbook for zookeeper
Ruby
2
star
31

flyingduck-chef

Chef cookbook for the installation of Flying Duck (Arrow Flight Server with DuckDB)
Ruby
2
star
32

elasticsearch-chef

Karamelized Chef cookbook that installs ElasticSearch.
Ruby
2
star
33

kube-hops-chef

Karamelized cookbook to deploy Kubernetes on the Hops platform
HTML
2
star
34

dela-chef

Dela is a p2p service for sharing datasets between Hadoop/Kafka Clusters.
Ruby
2
star
35

hopsworks-cloud-sdk

SDK for integrating Hopsworks with different cloud solutions
Python
1
star
36

hopsmonitor-chef

Wrapper chef cookbook Prometheus
HTML
1
star
37

cloud-chef

Cookbook to setup Hopsworks Cloud AMIs
Ruby
1
star
38

conda-chef

Karamelized Chef cookbook for installing Anaconda python package manager
Ruby
1
star
39

git-nullmerge

Git command to find and merge identical trees
Shell
1
star
40

mysqld_exporter

Exporter for MySQL server metrics
Go
1
star
41

tensorflow-chef

This is Chef Cookbook (Karamelized) to install Tensorflow using chef solo. Tensorflow is a Google framework for deep learning.
Ruby
1
star