• Stars
    star
    6,050
  • Rank 6,339 (Top 0.2 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created 7 months ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Insanely Fast Whisper

An opinionated CLI to transcribe Audio files w/ Whisper on-device! Powered by πŸ€— Transformers, Optimum & flash-attn

TL;DR - Transcribe 150 minutes (2.5 hours) of audio in less than 98 seconds - with OpenAI's Whisper Large v3. Blazingly fast transcription is now a reality!⚑️

Not convinced? Here are some benchmarks we ran on a Nvidia A100 - 80GB πŸ‘‡

Optimisation type Time to Transcribe (150 mins of Audio)
large-v3 (Transformers) (fp32) ~31 (31 min 1 sec)
large-v3 (Transformers) (fp16 + batching [24] + bettertransformer) ~5 (5 min 2 sec)
large-v3 (Transformers) (fp16 + batching [24] + Flash Attention 2) ~2 (1 min 38 sec)
distil-large-v2 (Transformers) (fp16 + batching [24] + bettertransformer) ~3 (3 min 16 sec)
distil-large-v2 (Transformers) (fp16 + batching [24] + Flash Attention 2) ~1 (1 min 18 sec)
large-v2 (Faster Whisper) (fp16 + beam_size [1]) ~9.23 (9 min 23 sec)
large-v2 (Faster Whisper) (8-bit + beam_size [1]) ~8 (8 min 15 sec)

P.S. We also ran the benchmarks on a Google Colab T4 GPU instance too!

P.P.S. This project originally started as a way to showcase benchmarks for Transformers, but has since evolved into a lightweight CLI for people to use. This is purely community driven. We add whatever community seems to have a strong demand for!

πŸ†• Blazingly fast transcriptions via your terminal! ⚑️

We've added a CLI to enable fast transcriptions. Here's how you can use it:

Install insanely-fast-whisper with pipx (pip install pipx or brew install pipx):

pipx install insanely-fast-whisper

Note: Due to a dependency on onnxruntime, Python 3.12 is currently not supported. You can force a Python version (e.g. 3.11) by adding --python python3.11 to the command.

⚠️ If you have python 3.11.XX installed, pipx may parse the version incorrectly and install a very old version of insanely-fast-whisper without telling you (version 0.0.8, which won't work anymore with the current BetterTransformers). In that case, you can install the latest version by passing --ignore-requires-python to pip:

pipx install insanely-fast-whisper --force --pip-args="--ignore-requires-python"

If you're installing with pip, you can pass the argument directly: pip install insanely-fast-whisper --ignore-requires-python.

Run inference from any path on your computer:

insanely-fast-whisper --file-name <filename or URL>

Note: if you are running on macOS, you also need to add --device-id mps flag.

πŸ”₯ You can run Whisper-large-v3 w/ Flash Attention 2 from this CLI too:

insanely-fast-whisper --file-name <filename or URL> --flash True 

🌟 You can run distil-whisper directly from this CLI too:

insanely-fast-whisper --model-name distil-whisper/large-v2 --file-name <filename or URL> 

Don't want to install insanely-fast-whisper? Just use pipx run:

pipx run insanely-fast-whisper --file-name <filename or URL>

Note

The CLI is highly opinionated and only works on NVIDIA GPUs & Mac. Make sure to check out the defaults and the list of options you can play around with to maximise your transcription throughput. Run insanely-fast-whisper --help or pipx run insanely-fast-whisper --help to get all the CLI arguments along with their defaults.

CLI Options

The insanely-fast-whisper repo provides an all round support for running Whisper in various settings. Note that as of today 26th Nov, insanely-fast-whisper works on both CUDA and mps (mac) enabled devices.

  -h, --help            show this help message and exit
  --file-name FILE_NAME
                        Path or URL to the audio file to be transcribed.
  --device-id DEVICE_ID
                        Device ID for your GPU. Just pass the device number when using CUDA, or "mps" for Macs with Apple Silicon. (default: "0")
  --transcript-path TRANSCRIPT_PATH
                        Path to save the transcription output. (default: output.json)
  --model-name MODEL_NAME
                        Name of the pretrained model/ checkpoint to perform ASR. (default: openai/whisper-large-v3)
  --task {transcribe,translate}
                        Task to perform: transcribe or translate to another language. (default: transcribe)
  --language LANGUAGE   
                        Language of the input audio. (default: "None" (Whisper auto-detects the language))
  --batch-size BATCH_SIZE
                        Number of parallel batches you want to compute. Reduce if you face OOMs. (default: 24)
  --flash FLASH         
                        Use Flash Attention 2. Read the FAQs to see how to install FA2 correctly. (default: False)
  --timestamp {chunk,word}
                        Whisper supports both chunked as well as word level timestamps. (default: chunk)
  --hf_token
                        Provide a hf.co/settings/token for Pyannote.audio to diarise the audio clips

Frequently Asked Questions

How to correctly install flash-attn to make it work with insanely-fast-whisper?

Make sure to install it via pipx runpip insanely-fast-whisper install flash-attn --no-build-isolation. Massive kudos to @li-yifei for helping with this.

How to solve an AssertionError: Torch not compiled with CUDA enabled error on Windows?

The root cause of this problem is still unknown, however, you can resolve this by manually installing torch in the virtualenv like python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121. Thanks to @pto2k for all tdebugging this.

How to avoid Out-Of-Memory (OOM) exceptions on Mac?

The mps backend isn't as optimised as CUDA, hence is way more memory hungry. Typically you can run with --batch-size 4 without any issues (should use roughly 12GB GPU VRAM). Don't forget to set --device-id mps.

How to use Whisper without a CLI?

All you need to run is the below snippet:
pip install --upgrade transformers optimum accelerate
import torch
from transformers import pipeline
from transformers.utils import is_flash_attn_2_available

pipe = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-large-v3", # select checkpoint from https://huggingface.co/openai/whisper-large-v3#model-details
    torch_dtype=torch.float16,
    device="cuda:0", # or mps for Mac devices
    model_kwargs={"attn_implementation": "flash_attention_2"} if is_flash_attn_2_available() else {"attn_implementation": "sdpa"},
)

outputs = pipe(
    "<FILE_NAME>",
    chunk_length_s=30,
    batch_size=24,
    return_timestamps=True,
)

outputs

Acknowledgements

  1. OpenAI Whisper team for open sourcing such a brilliant check point.
  2. Hugging Face Transformers team, specifically Arthur, Patrick, Sanchit & Yoach (alphabetical order) for continuing to maintain Whisper in Transformers.
  3. Hugging Face Optimum team for making the BetterTransformer API so easily accessible.
  4. Patrick Arminio for helping me tremendously to put together this CLI.

Community showcase

  1. @ochen1 created a brilliant MVP for a CLI here: https://github.com/ochen1/insanely-fast-whisper-cli (Try it out now!)
  2. @arihanv created an app (Shush) using NextJS (Frontend) & Modal (Backend): https://github.com/arihanv/Shush (Check it outtt!)
  3. @kadirnar created a python package on top of the transformers with optimisations: https://github.com/kadirnar/whisper-plus (Go go go!!!)

More Repositories

1

fast-whisper-finetuning

Jupyter Notebook
320
star
2

ml-with-audio

HF's ML for Audio study group
Jupyter Notebook
159
star
3

translate-with-whisper

Jupyter Notebook
127
star
4

fast-llm.rs

Rust
119
star
5

notebooks

Jupyter Notebook
54
star
6

10_days_of_deep_learning

10 days 10 different practical applications of Deep Learning (primarily NLP) using Tensorflow and Keras
Jupyter Notebook
31
star
7

on-device-llm-playground

A repo with scripts to test and play around with Facebook's recent llama models! πŸ€—
Python
25
star
8

ml-with-text

[Tutorial] Demystifying Natural Language Processing with Python
Jupyter Notebook
24
star
9

ml-with-timeseries

Machine Learning with Time Series data
Jupyter Notebook
19
star
10

how-to-asr

Jupyter Notebook
16
star
11

dcase-2023-workshop

Jupyter Notebook
14
star
12

deploy-audio-endpoints

Python
7
star
13

turbo-llm

Python
7
star
14

how-to-whisper

Jupyter Notebook
6
star
15

scratchpad

Jupyter Notebook
3
star
16

how-to-computational-linguistics

2
star
17

zomato-web-scraper

A nifty tool to scrape data off Zomato and mail it to you.
Python
2
star
18

learn-ml

Modified notebooks (single) from kaggle.com/learn with added nuances
Jupyter Notebook
2
star
19

news_classifier

Python
2
star
20

Toucan-Fork

Python
2
star
21

benchmark-asr

Python
2
star
22

score-audio

2
star
23

speech-ecosystem-scripts

2
star
24

Vaibhavs10

VB's GH landing page
Python
2
star
25

anli-performance-prediction

Python
1
star
26

sentiment-movie-imdb

Simple review sentiment classifier!
Jupyter Notebook
1
star
27

simple-text-message-app

A simple text message notification app
Python
1
star
28

ml-on-gcp

The repository walks through a Data Scientist focused way of building and deploying Machine Learning models on Google Cloud
Jupyter Notebook
1
star
29

what-the-audio

1
star
30

snippets

Random but often useful snippets for day to day hacking!
Python
1
star
31

static-resume

A static resume
HTML
1
star
32

summer_of_bitcoin

Python
1
star
33

RC-Interview-task

Graph manipulation using PostgreSQL and Networkx
Jupyter Notebook
1
star
34

stats101

Code and high level information to get started with Statistics and Math required for Machine Learning
Jupyter Notebook
1
star
35

junk_models

1
star
36

facebook-bot-flask

A facebook messenger bot built using flask as a rest API
Python
1
star
37

kaggle-titanic

An open cheat sheet which goes in somewhat detail in understanding the Machine Learning concepts and some code :)
Jupyter Notebook
1
star
38

quora-question-pair

Code and analysis for Quora question pair challenge on Kaggle
Jupyter Notebook
1
star
39

common_voice_dataset_generator

Python
1
star
40

homebred-tap

Ruby
1
star