• Stars
    star
    1,625
  • Rank 28,787 (Top 0.6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 1 year ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

WhisperPlus: Faster, Smarter, and More Capable 🚀

WhisperPlus: Advancing Speech2Text and Text2Speech Feature 🚀

teaser

🛠️ Installation

pip install whisperplus

🤗 Model Hub

You can find the models on the HuggingFace Model Hub

🎙️ Usage

To use the whisperplus library, follow the steps below for different tasks:

🎵 Youtube URL to Audio

from whisperplus import SpeechToTextPipeline, download_and_convert_to_mp3

url = "https://www.youtube.com/watch?v=di3rHkEZuUw"

audio_path = download_and_convert_to_mp3(url)
pipeline = SpeechToTextPipeline(model_id="openai/whisper-large-v3")
transcript = pipeline(audio_path, "openai/whisper-large-v3", "english")

print(transcript)

📰 Summarization

from whisperplus import TextSummarizationPipeline

summarizer = TextSummarizationPipeline(model_id="facebook/bart-large-cnn")
summary = summarizer.summarize(transcript)
print(summary[0]["summary_text"])

📰 Long Text Support Summarization

from whisperplus import LongTextSummarizationPipeline

summarizer = LongTextSummarizationPipeline(model_id="facebook/bart-large-cnn")
summary_text = summarizer.summarize(transcript)
print(summary_text)

💬 Speaker Diarization

from whisperplus import (
    ASRDiarizationPipeline,
    download_and_convert_to_mp3,
    format_speech_to_dialogue,
)

audio_path = download_and_convert_to_mp3("https://www.youtube.com/watch?v=mRB14sFHw2E")

device = "cuda"  # cpu or mps
pipeline = ASRDiarizationPipeline.from_pretrained(
    asr_model="openai/whisper-large-v3",
    diarizer_model="pyannote/speaker-diarization",
    use_auth_token=False,
    chunk_length_s=30,
    device=device,
)

output_text = pipeline(audio_path, num_speakers=2, min_speaker=1, max_speaker=2)
dialogue = format_speech_to_dialogue(output_text)
print(dialogue)

⭐ RAG - Chat with Video(LanceDB)

from whisperplus.pipelines.chatbot import ChatWithVideo

chat = ChatWithVideo(
    input_file="trascript.txt",
    llm_model_name="TheBloke/Mistral-7B-v0.1-GGUF",
    llm_model_file="mistral-7b-v0.1.Q4_K_M.gguf",
    llm_model_type="mistral",
    embedding_model_name="sentence-transformers/all-MiniLM-L6-v2",
)

query = "what is this video about ?"
response = chat.run_query(query)
print(response)

🌠 RAG - Chat with Video(AutoLLM)

from whisperplus import AutoLLMChatWithVideo

# service_context_params
system_prompt = """
You are an friendly ai assistant that help users find the most relevant and accurate answers
to their questions based on the documents you have access to.
When answering the questions, mostly rely on the info in documents.
"""
query_wrapper_prompt = """
The document information is below.
---------------------
{context_str}
---------------------
Using the document information and mostly relying on it,
answer the query.
Query: {query_str}
Answer:
"""

chat = AutoLLMChatWithVideo(
    input_file="input_dir",  # path of mp3 file
    openai_key="YOUR_OPENAI_KEY",  # optional
    huggingface_key="YOUR_HUGGINGFACE_KEY",  # optional
    llm_model="gpt-3.5-turbo",
    llm_max_tokens="256",
    llm_temperature="0.1",
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    embed_model="huggingface/BAAI/bge-large-zh",  # "text-embedding-ada-002"
)

query = "what is this video about ?"
response = chat.run_query(query)
print(response)

🎙️ Speech to Text

from whisperplus import TextToSpeechPipeline

tts = TextToSpeechPipeline(model_id="suno/bark")
audio = tts(text="Hello World", voice_preset="v2/en_speaker_6")

🎥 AutoCaption

from whisperplus import WhisperAutoCaptionPipeline

caption = WhisperAutoCaptionPipeline(model_id="openai/whisper-large-v3")
caption(video_path="test.mp4", output_path="output.mp4", language="turkish")

😍 Contributing

pip install -r dev-requirements.txt
pre-commit install
pre-commit run --all-files

📜 License

This project is licensed under the terms of the Apache License 2.0.

🤗 Citation

@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

More Repositories

1

segment-anything-video

MetaSeg: Packaged version of the Segment Anything repository
Python
942
star
2

torchyolo

PyTorch implementation of YOLOv5, YOLOv6, YOLOv7, YOLOv8, Sort, StrongSort, OcSort, ByteTrack, Norfair
Python
168
star
3

yolov9-pip

This repo is a packaged version of the Yolov9 model.
Python
84
star
4

yolov5-strongsort

Minimal PyTorch implementation of YOLOv5 and StrongSort
Python
65
star
5

Video-Diffusion-WebUI

Video Diffusion WebUI: Text2Video + Image2Video + Video2Video WebUI
Python
58
star
6

Stable-Diffusion-ControlNet-WebUI

Diffusion WebUI: Stable Diffusion + ControlNet + Inpaint
Python
52
star
7

Yolov7-SAHI

A lightweight vision library for performing large scale object detection & instance segmentation
Python
51
star
8

bytetrack-pip

Packaged version of the ByteTrack repository
Python
39
star
9

yolov5-sahi

Yolov5 Modelini Kullanarak Özel Nesne Eğitimi ve SAHI Kullanımı
36
star
10

combat-drone

Savaşan İnsansız Hava Aracı için Hedef Takip Sistemi
Python
31
star
11

yolov7-pip

This repo is a packaged version of the Yolov7 model.
Python
26
star
12

codeformer-pip

Towards Robust Blind Face Restoration with Codebook Lookup Transformer
Python
25
star
13

ComfyUI-YOLO

ComfyUI-YOLO: Ultralytics-Powered Object Recognition for ComfyUI
Python
25
star
14

diffusersplus

This project is under development.
Jupyter Notebook
23
star
15

ComfyUI-Transformers

Python
19
star
16

TrackerHub

Real-time Multi-Object Tracking Library
Python
16
star
17

Yolov6-SAHI

Python
16
star
18

ChatGptHub

ChatGptHub: Gpt Chatbot Library with LangChain Support
Python
15
star
19

ocsort-pip

OcSort-Pip: Packaged version of the OcSort repository
Python
14
star
20

bsrgan-pip

BSRGAN-Pip: Packaged version of the BSRGAN repository
Python
13
star
21

losshub

LossHub: Loss Functions Library for Image Classification and Detection
Python
13
star
22

strongsort-pip

StrongSort-Pip: Packaged version of StrongSort
Python
11
star
23

sahi-learn

Bu repo SAHI uygulamasını mantığını öğreniyoruz.
Python
11
star
24

yolov4-tiny-face-mask-detection

Yolov4 tiny face mask detection
10
star
25

Pyorch-LeNet5

PyTorch implementation of LeNet5
Python
9
star
26

Generative-AI-Roadmap

9
star
27

dethub

Object Detection Model Library
Python
8
star
28

wideryolo

WIDERYOLO : Yüz Tespit Uygulaması Yap
Python
6
star
29

sort-pip

Sort-Pip: Packaged version of the Sort repository
Python
5
star
30

kadirnar

5
star
31

Hugging-Face-Projects

My Hugging Face Projects
5
star
32

AnimeSr-Pip

This repo is a packaged version of the AnimeSr Library.
Python
4
star
33

yolov6-pip

This repo is a packaged version of the Yolov6 model.
Jupyter Notebook
4
star
34

hf-llm-inference

3
star
35

yolov5-format-datasets

Veri Setinizi Yolov5 Formatına Dönüştürün
Python
3
star
36

ComfyUI-Adapter

Python
3
star
37

yolox-pip

Python
3
star
38

deep-learning-bootcamp

3
star
39

semantic-segmentation-pytorch

Pytorch Kullanarak Semantik Segmentasyon Uygulaması
Python
3
star
40

Minimal-Yolov6

YOLOv6: Single-stage Object Detection
Python
3
star
41

Norfair-Track

This repo is a packaged version of the Norfair(Tracker Module).
Python
3
star
42

metasam

Python
2
star
43

AnimeDiffusion

Python
2
star
44

minidiffusers

Python
2
star
45

cloth-diffusion

2
star
46

workflow-utils

Python
2
star
47

open-virtual-try-on

2
star
48

deepsort-pip

DeepSort-Pip: Packaged version of the DeepSort repository
Python
2
star
49

comfyui_hub

Python
2
star
50

mlx-diffusion

Python
2
star
51

auto-fast

Python
2
star
52

sahi-tutorials

SAHI Kütüphanesi için Detaylı Türkçe Dokümantasyonu
2
star
53

llmrush

Python
2
star
54

Multilingual-Translation

Python
1
star
55

underwater-color-correction

The project is under development.
1
star
56

combat-plane

Target Tracking System for Combat Uav
1
star
57

ardupilot-tutorial

https://ardupilot.org/plane/docs aldığım notlar
1
star
58

uav-datasets

Açık Kaynak İnsansız Hava Aracı Veri Seti
Python
1
star
59

OpenLatent

Python
1
star
60

LearnDiffusion

LearnDiffusion
1
star
61

diffusers-lite

1
star
62

ultralytics-tracker

1
star
63

Stable-Diffusion-Outpainting

1
star
64

pytorch-image-classification

Pytorch ile Resim Sınıflandırma
Python
1
star
65

Detectron2

Python
1
star
66

SAHI-Detectron

SAHIDET2: SAHI Kullanarak Yüksek Doğruluklu Nesne Tespit Uygulaması Yap!
Python
1
star
67

diffusers-api

Python
1
star
68

classifyhub

Python
1
star
69

yolov4-detection-train

Colab kullanarak yolo modelleri test ve eğitim işlerinin nasıl yapılacağını anlattım.
Jupyter Notebook
1
star
70

Stable-Cascade-Demo

Python
1
star
71

python-ile-algoritmalar

Python dilinde Algoritmalar
1
star
72

SkinCancer-Classification

Resnet18 SkinCancer: A Deep Learning Model for Skin Cancer Detection
Python
1
star