We're hiring 🚀

Doc | Website | Community | Blog

Instill VDP

Versatile Data Pipeline (VDP) is a source available unstructured data ETL tool to streamline the end-to-end unstructured data processing pipeline:

Extract unstructured data from pre-built data sources such as cloud/on-prem storage, or IoT devices
Transform it into analysable or meaningful data representations by AI models
Load the transformed data into warehouses, applications, or other destinations

Highlights

🚀 The fastest way to build end-to-end unstructured data pipelines - building a pipeline is like assembling LEGO blocks
⚡️ High-performing backends implemented in Go with Triton Inference Server for unleashing the full power of NVIDIA GPU architecture (e.g., concurrency, scheduler, batcher) supporting TensorRT, PyTorch, TensorFlow, ONNX, Python and more.
🖱️ One-click import & deploy ML/DL models from GitHub, Hugging Face or cloud storage managed by version control tools like DVC or ArtiVC
📦 Standardised AI Task output formats to streamline data integration or analysis
🔌 Pre-built ETL data connectors for extensive data access integrated with Airbyte
🪢 Build pipelines for diverse scenarios - SYNC mode for real-time inference and ASYNC mode for on-demand workload
🧁 Scalable API-first microservice design for great developer experience - seamless integration to modern data stack at any scale
🤠 Built for every AI and Data practitioner - The no-/low-code interface helps take off your AI Researcher/AI Engineer/Data Engineer/Data Scientist hat and put on the all-rounder hat to deliver more with VDP

Demo playground

An online demo VDP instance has been provisioned, in which you can directly play around the basic features in its Console via https://demo.instill.tech.

Want to showcase your ML/DL models? We offer fully-managed VDP on Instill Cloud. Please sign up the form and we will reach out to you.

Prerequisites

macOS or Linux - VDP works on macOS or Linux, but does not support Windows yet.
Docker and Docker Compose - VDP uses Docker Compose (specifically, Compose V2 and Compose specification) to run all services at local. Please install the latest stable Docker and Docker Compose before using VDP.
yq > v4.x. Please follow the installation guide.
(Optional) NVIDIA Container Toolkit - To enable GPU support in VDP, please refer to NVIDIA Cloud Native Documentation to install NVIDIA Container Toolkit. If you'd like to specifically allot GPUs to VDP, you can set the environment variable NVIDIA_VISIBLE_DEVICES. For example, NVIDIA_VISIBLE_DEVICES=0,1 will make the triton-server consume GPU device id 0 and 1 specifically. By default NVIDIA_VISIBLE_DEVICES is set to all to use all available GPUs on the machine.

Quick start

Execute the following commands to start pre-built images with all the dependencies:

$ git clone https://github.com/instill-ai/vdp.git && cd vdp

# Launch all services
$ make all

🚀 That's it! Once all the services are up with health status, the UI is ready to go at http://localhost:3000!

Jump right in VDP 101: Create your first pipeline on VDP and explore other VDP tutorials.

Note

The image of model-backend (~2GB) and Triton Inference Server (~23GB) can take a while to pull, but this should be an one-time effort at the first setup.

Shut down VDP

To shut down all running services:

$ make down

Guidance philosophy

VDP is built with open heart and we expect VDP to be exposed to more MLOps integrations. It is implemented with microservice and API-first design principle. Instead of building all components from scratch, we've decided to adopt sophisticated open-source tools:

Triton Inference Server for high-performance model serving
Temporal for a reliable, durable and scalable workflow engine
Airbyte for abundant destination connectors

We hope VDP can also enrich the open-source communities in a way to bring more practical use cases in unstructured data processing.

Documentation

📔 Documentation

Check out the documentation & tutorials to learn VDP!

📘 API Reference

The gRPC protocols in protobufs provide the single source of truth for the VDP APIs. The genuine protobuf documentation can be found in our Buf Scheme Registry (BSR).

For the OpenAPI documentation, access http://localhost:3001 after make all, or simply run make doc.

Model Hub

We curate a list of ready-to-use models for VDP. These models are from different sources and have been tested by our team. Want to contribute a new model? Please create an issue, we are happy to test and add it to the list 👐.

Model	Task	Sources	Framework	CPU	GPU
MobileNet v2	Image Classification	GitHub-DVC	ONNX	✅	✅
Vision Transformer (ViT)	Image Classification	Hugging Face	ONNX	✅	❌
YOLOv4	Object Detection	GitHub-DVC	ONNX	✅	✅
YOLOv7	Object Detection	GitHub-DVC	ONNX	✅	✅
YOLOv7 W6 Pose	Keypoint Detection	GitHub-DVC	ONNX	✅	✅
PSNet + EasyOCR	Optical Character Recognition (OCR)	GitHub-DVC	ONNX	✅	✅
Mask RCNN	Instance Segmentation	GitHub-DVC	PyTorch	✅	✅
Lite R-ASPP based on MobileNetV3	Semantic Segmentation	GitHub-DVC	ONNX	✅	✅
Stable Diffusion	Text to Image	GitHub-DVC, Local-CPU, Local-GPU	ONNX	✅	✅
Megatron GPT2	Text Generation	GitHub-DVC	FasterTransformer	❌	✅

Note: The GitHub-DVC source in the table means importing a model into VDP from a GitHub repository that uses DVC to manage large files.

Community support

For general help using VDP, you can use one of these channels:

GitHub - bug reports, feature requests, project discussions and contributions
Discord - live discussion with the community and our team
Newsletter & Twitter - get the latest updates

If you are interested in hosting service of VDP, we've started signing up users to our private alpha. Get early access and we'll contact you when we're ready.

Contributing

We love contribution to VDP in any forms:

Please refer to the guideline for local development.
Please open a topic in the repository Discussions for any feature requests.
Please open issues for bug report in the repository
- vdp for general issues;
- pipeline-backend, connector-backend, model-backend, console, etc., for specific issues.
Please refer to the VDP project board to track progress.

Note
Code in the main branch tracks under-development progress towards the next release and may not work as expected. If you are looking for a stable alpha version, please use latest release.

License

See the LICENSE file for licensing information.

We're hiring 🚀

Interested in building VDP with us? Join our remote team and build the future for unstructured data ETL. Check out our open roles.

instill-ai/instill-core

instill-ai

Reviews

Repository Details

We're hiring 🚀

Doc | Website | Community | Blog

Instill VDP

Highlights

Demo playground

Prerequisites

Quick start

Guidance philosophy

Documentation

Model Hub

Community support

Contributing

License

We're hiring 🚀

More Repositories