• Stars
    star
    153
  • Rank 243,368 (Top 5 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A browser extension that lets you chat with YouTube videos using Llama2-7b. Built using 🤗 Inference Endpoints and Vercel's AI SDK.

Chat with YouTube

Introducing Chat with YouTube, a browser extension that lets you chat with YouTube videos! This is a small project that demonstrates how easy it is to build conversational browser extensions using Hugging Face Inference Endpoints and the Vercel AI SDK.

Demo Application

Since the license is MIT, feel free to fork the project, make improvements, and release it yourself on the Chrome/Firefox Web Store!

Running Locally

We recommend opening up two terminal windows side-by-side, one for the server and one for the extension.

  1. Clone the repository

    git clone https://github.com/xenova/chat-with-youtube.git
  2. Set up the server

    1. Switch to the server directory:

      cd server
    2. Install the necessary dependencies:

      npm install
    3. Create a file .env.local with your Hugging Face Access Token and Inference Endpoint URL. See .env.local.example for an example. If you haven't got these yet, this guide will help you get started.

      HUGGINGFACE_API_KEY=hf_xxx
      HUGGINGFACE_INFERENCE_ENDPOINT_URL=https://YOUR_ENDPOINT.endpoints.huggingface.cloud
    4. Start the server:

      npm run dev
  3. Set up the extension

    1. Switch to the extension folder:

      cd extension
    2. Install the necessary dependencies:

      npm install 
    3. Build the project:

      npm run build 
    4. Add the extension to your browser. To do this, go to chrome://extensions/, enable developer mode (top right), and click "Load unpacked". Select the build directory from the dialog which appears and click "Select Folder".

    5. That's it! You should now be able to open the extenion's popup and use the model in your browser!

How does it work?

Well, it's quite simple actually: just add the transcript and video metadata to the prompt for additional context. For this demo, we use Llama-2-7b-hf, which has a context length of 4096 tokens, and can easily handle most videos. Of course, for longer videos, it would be best to implement segmentation and retrieval augmented generation, but that's beyond the scope of this project.