• Stars
    star
    349
  • Rank 121,528 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 2 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Youtube GPT: OpenAI Whisper + Embedding + Davinci

YoutubeGPT 🤖

Read the article to know how it works: Medium Article

With Youtube GPT you will be able to extract all the information from a video on YouTube just by pasting the video link. You will obtain the transcription, the embedding of each segment and also ask questions to the video through a chat.

All code was written with the help of Code GPT

Captura de Pantalla 2023-02-08 a la(s) 9 16 43 p  m



Features

  • Video transcription with OpenAI Whisper
  • Embedding Transcript Segments with the OpenAI API (text-embedding-ada-002)
  • Chat with the video using streamlit-chat and OpenAI API (text-davinci-003)

Example

For this example we are going to use this video from The PyCoach https://youtu.be/lKO3qDLCAnk

Add the video URL and then click Start Analysis Youtube

Pytube and OpenAI Whisper

The video will be downloaded with pytube and then OpenAI Whisper will take care of transcribing and segmenting the video. Pyyube Whisper

# Get the video 
youtube_video = YouTube(youtube_link)
streams = youtube_video.streams.filter(only_audio=True)
mp4_video = stream.download(filename='youtube_video.mp4')
audio_file = open(mp4_video, 'rb')

# whisper load base model
model = whisper.load_model('base')

# Whisper transcription
output = model.transcribe("youtube_video.mp4")

Embedding with "text-embedding-ada-002"

We obtain the vectors with text-embedding-ada-002 of each segment delivered by whisper Embedding

# Embeddings
segments = output['segments']
for segment in segments:
    openai.api_key = user_secret
    response = openai.Embedding.create(
        input= segment["text"].strip(),
        model="text-embedding-ada-002"
    )
    embeddings = response['data'][0]['embedding']
    meta = {
        "text": segment["text"].strip(),
        "start": segment['start'],
        "end": segment['end'],
        "embedding": embeddings
    }
    data.append(meta)
pd.DataFrame(data).to_csv('word_embeddings.csv') 

OpenAI GPT-3

We make a question to the vectorized text, we do the search of the context and then we send the prompt with the context to the model "text-davinci-003"

Question1

We can even ask direct questions about what happened in the video. For example, here we ask about how long the exercise with Numpy that Pycoach did in the video took.

Question2

Running Locally

  1. Clone the repository
git clone https://github.com/davila7/youtube-gpt
cd youtube-gpt
  1. Install dependencies

These dependencies are required to install with the requirements.txt file:

pip install -r requirements.txt
  1. Run the Streamlit server
streamlit run app.py

Upcoming Features 🚀

  • Semantic search with embedding
  • Chart with emotional analysis
  • Connect with Pinecone

More Repositories

1

code-gpt-docs

Docusaurus page
JavaScript
546
star
2

file-gpt

Start a chat with any document with Ada Embedding and Davinci Completion
Python
142
star
3

langchain-101

Langchain 101 en Español
Python
73
star
4

AIXP

AI-Exchange Protocol (AIXP): A Communication Standard for Artificial Intelligence Agents
50
star
5

talk-to-gpt-3

Whisper + OpenAI + Speech Recognition
Python
41
star
6

visual-embeddings

Visual Embeddings with OpenAI and Nomic
Python
12
star
7

AyudaChileGPT

Agente IA experto en centros de Ayuda para emergencias en Chile
Python
12
star
8

davila7

Profile
11
star
9

google-flan-t5

Google Flan T5
Python
11
star
10

stable-diffusion-free-gpu

Stable Diffusion with free Google Colab GPU
CSS
9
star
11

twitter-bot-v2

Bot en Twitter con Node.js
JavaScript
9
star
12

gpt-resources

Recursos generales sobre GPT
8
star
13

danigpt_telegram

DaniGPT llega a Telegram gracias a python
Python
8
star
14

codegpt-discord

Bot para discord con la API de CodeGPT en node
JavaScript
7
star
15

DSPy-101

DSPy framework
Jupyter Notebook
6
star
16

node-mongo

Lista de tareas en Nodejs y MongoDB
EJS
5
star
17

fake-fintual-copiloto

Fake Fintual Copiloto
Python
5
star
18

facebook-galactica

Facebook Galactica Model
Python
4
star
19

RAG

Embedding semantic search with Cohere
Jupyter Notebook
4
star
20

entelUC

Desafío Entel- Centro de Innovación UC
JavaScript
3
star
21

biogpt-vs-others-llm

Compare Microsoft BioGPT with others models
Python
3
star
22

aws-lambda-boto3

Función en python con boto3 para serverless framework
Python
3
star
23

the-eternal-memory-agent

The Eternal Memory Agent with CodeGPT + Streamlit + Langchain + Supabase
Python
2
star
24

openai_whisper_youtube

Whisper + Youtube in Python
Python
2
star
25

google-drive-gpt

OpenAI, Langchain, embedding with Google Drive files
2
star
26

Blog-Django

Python 2.7 / Django 1.7.7
Python
2
star
27

ContratosRentACar

Crea contratos, captura firma (Tablet) envia PDF al correo
JavaScript
2
star
28

llama-index-101

Llama Index 101
2
star
29

ContactlistNodeJS

Contact List (MongoDB, Express, AngularJS y NodeJS)
JavaScript
2
star
30

taximetrochile_android

JAVA
Java
2
star
31

android_sunshine

JAVA
Java
2
star
32

llm-free-gpu

Running a llm in a free gpu
CSS
2
star
33

constitutionalai

Constitutional AI by Anthropic
Python
1
star
34

ast-asg-graph-rag

Jupyter Notebook
1
star
35

verta

Verta: CodeGPT Plus + VertexAI PaLM2
Python
1
star
36

semantic-search

Semantic Search with Langchain
Python
1
star
37

agente_faq

CodeGPT Agent
Python
1
star
38

test-php-mysql

Prueba de Docker con php y mysql
1
star
39

blockchain.js

Crea y prueba tu propia blockchain en js
JavaScript
1
star
40

rag-query-expansion

RAG with query expansion
Jupyter Notebook
1
star
41

crud-node-dynamodb

crud with serverless and dynamodb
JavaScript
1
star
42

AndroidQRScanner

Simple QR Scanner for Android
Java
1
star
43

RadioTaxi

ASP.Net, C#, MVC 5, Bootstrap, Jquery, SQLServer
C#
1
star