• Stars
    star
    185
  • Rank 208,271 (Top 5 %)
  • Language
    Swift
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Text to 3D generation in Apple Vision Pro built with the VisionOS SDK. 3D Scribblenauts in AR for the Scale Generative AI Hackathon. Won Scale AI Prize

Alt text

Dream with Vision Pro

Discord

Welcome to Dream with Vision Pro, a lucid text-to-3D tool built with the Apple VisionOS SDK. Powered by Scale AI's Spellbook, OpenAI's GPT-4 and Shap-E, Modal, Replicate, and the Meta Quest 2, we empower you to transform your imagination into stunning immersive experiences.

Alt text

Enter Your Vision:

Type in the text description of the object you envision. This could be anything from an elephant to a sword. Unleash your imagination. Once you've described it, your object will appear before you.

Demo

Alt text

Using Scale AI's Spellbound to infer the size of the objects to render accurately.

Alt text

How it Works

Here's a step-by-step breakdown of what Dream with Vision Pro does:

First, the user specifies the object they want to visualize. This input triggers the Shap-E model via Modal and Replicate, producing a .obj file - a standard 3D model format.

Next, we employ Spellbook and GPT-4 to estimate the object's height, ensuring the 3D representation is accurately scaled.

The final phase employs 3D Viewer to convert your .obj file into a realistic 3D model that you can interact with. This 3D model can be directly accessed from Apple's VisionOS, which we stream directly to your Meta Quest 2, offering a fully immersive experience of your original concept.

Spellbook Prompts

System:

As an AI system, you are extremely skilled at extracting objects and estimating their realistic height in meters from a given text prompt. Your task is to identify the object(s) mentioned in the prompt and their estimated height in meters. Once identified, the information must be formatted according to the provided format for a text-to-3D model application.

User:

Could you extract the object and realistic object height in meters from the following text prompts?

Begin:

Input: a red apple
Output: 0.075

Input: a large elephant
Output: 3.000


Input: {{ input }}
Output:

Next Steps

We've started to integrate OpenAI's Whisper model, expanding our capability beyond text-to-3D transformations. Users will be able to engage in a more intuitive way, interacting with their 3D creations through the power of voice.

Once we have the .obj file, we are working on using USZD Tools which lets us convert to the .usdz format - a requisite for VisionOS. Following this conversion, we can seamlessly render the objects.

Acknowledgements

We thank the Scale AI Spellbook team for the credits and ease of use, Ben Firshman of Replicate for the dedicated A100 GPU we run Shap-E on, Erik Bernhardsson of Modal for dedicated Whisper and hosted endpoints, and especially Mehran Jalali for letting us borrow the Meta Quest 2 for testing.

More Repositories

1

VisionCraft

Minecraft Clone in Apple Vision Pro built with VisionOS SDK
Swift
144
star
2

WatchGPT2

GPT2 running locally on the Apple Watch Ultra 2
Swift
23
star
3

feynman-lectures

The Feynman Challenge: Lectures On Physics + exercises!
CSS
13
star
4

AskElon

Ask Elon (417)-ASK-ELON, an Elon Musk AI you can call
Python
9
star
5

YOLO

Object Detection with You Only Look Once computer vision algorithm
Python
7
star
6

sigilwen.ca

Personal website hosted at sigilwen.ca
HTML
5
star
7

Llama-2-iPhone

LLama 2 Chat on the iPhone
Swift
4
star
8

Vision-OS-Pepe-Tac-Doge

Hello World from Vision OS Beta. Tic Tac Toe with Doge and Pepe 3D Models
Swift
4
star
9

Full-Homomorphic-Encryption

Example of using FHE w/ Lattigo in Go. This example simulates a situation where an anonymous rider wants to find the closest available rider within a given area. The application is inspired by the paper https://oride.epfl.ch/
Go
3
star
10

Homomorphic-Encryption

Basic Partial Homomorphic Encryption in Python.
Python
2
star
11

Karpathy

Andrej Karpathy ML YouTube Series
Jupyter Notebook
2
star
12

sigil.tv

sigil.tv apple vision pro at stanford's treehacks
HTML
2
star
13

multi-keyframe-video-generation

Creating multi image keyframe generative videos using Luma Labs' Dream Machine
Python
1
star
14

EncryptedLoginServer

Node.js, express.js, MongoDB, server for sign up and logins. Mongoose and bcrypt was used.
JavaScript
1
star
15

recommendation-algorithms

Papers & Resources for Recommendation Algorithms
1
star
16

google-glass-stopwatch

Starter repo for Google glass hacking - includes .apk that you can use adb to put onto your glass
HTML
1
star
17

JavaNeuralNetworks

Neural Networks In Java
Java
1
star
18

FirebaseBackend

JavaScript
1
star
19

Electron-Template

Sigil Learn's the foundations of electron.js :) It's pretty fun.
JavaScript
1
star
20

webrtc-videochatapp

JavaScript
1
star
21

QuantumComputing

Python library for building Quantum Circuits
1
star