AI-powered search & chat for Naval Ravikant's Twitter thread "How To Get Rich."
(adding more content soon)
Everything is 100% open source.
The dataset consists of 2 CSV files containing all text & embeddings used.
Download clips data here.
Download passages data here.
I recommend getting familiar with fetching, cleaning, and storing data as outlined in the scraping and embedding scripts below, but feel free to skip those steps and just use the dataset.
Naval GPT provides 3 things:
- Search
- Chat
- Audio
Search was created with OpenAI Embeddings (text-embedding-ada-002
).
First, we loop over the passages from Naval's formatted blog post and generate embeddings for each chunk of text.
We do this because we can render the beautifully formatted text in our app by saving the HTML.
In the app, we take the user's search query, generate an embedding, and use the result to find the most similar passages.
The comparison is done using cosine similarity across our database of vectors.
Our database is a Postgres database with the pgvector extension hosted on Supabase.
Results are ranked by similarity score and returned to the user.
Chat builds on top of search. It uses search results to create a prompt that is fed into GPT-3.5-turbo.
This allows for a chat-like experience where the user can ask questions about the topic and get answers.
The podcast player is a simple audio player that plays the podcast for this thread.
We use Python and OpenAI Whisper to loop over the podcast to generate embeddings for each 1min chunk of audio.
We then use the same method as search to find the most similar clip.
During our audio processing we saved timestamps for each clip, so we then jump to that timestamp for the podcast in the app.
Here's a quick overview of how to run it locally.
- Set up OpenAI
You'll need an OpenAI API key to generate embeddings.
- Set up Supabase and create a database
Note: You don't have to use Supabase. Use whatever method you prefer to store your data. But I like Supabase and think it's easy to use.
There is a schema.sql file in the root of the repo that you can use to set up the database.
Run that in the SQL editor in Supabase as directed.
I recommend turning on Row Level Security and setting up a service role to use with the app.
- Clone repo
git clone https://github.com/mckaywrigley/naval-gpt.git
- Install dependencies
npm i
- Set up environment variables
Create a .env.local file in the root of the repo with the following variables:
OPENAI_API_KEY=
NEXT_PUBLIC_SUPABASE_URL=
SUPABASE_SERVICE_ROLE_KEY=
You'll also need to save your OpenAI API key as an environment variable in your OS.
export OPENAI_API_KEY=
- Run text scraping script
npm run scrape
This scrapes the content from Naval's website and saves it to a json file.
- Run text embedding script
npm run embed-text
This reads the json file, generates embeddings for each passage, and saves the results to your database.
There is a 200ms delay between each request to avoid rate limiting.
This process will take 10-15 minutes.
- Download podcast
Download the podcast and add it as "podcast.mp3" to the public directory.
- Run the audio processing script
Note: You'll need to have Python installed on your machine.
cd scripts
python3 main.py
This splits the podcast into 1min chunks and generates embeddings for each chunk.
The results are saved to a json file.
There is a 1.2s delay between each request to avoid rate limiting.
It will take 20-30 minutes to run.
- Run audio embedding script
npm run embed-audio
This reads the json file, generates embeddings for each clip, and saves the results to your database.
There is a 200ms delay between each request to avoid rate limiting.
This process will take about 5 minutes.
- Run app
npm run dev
Thanks to Naval Ravikant for publicizing his thoughts - they've proven to be an invaluable source of wisdom for all of us.
If you have any questions, feel free to reach out to me on Twitter!
I sacrificed composability for simplicity in the app.
You can split up a lot of the stuff in index.tsx into separate components.