Whisper相关内容
GitHub页面:https://github.com/openai/whisper
Available models and languages
Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
---|---|---|---|---|---|
tiny | 39 M | tiny.en | tiny | ~1 GB | ~32x |
base | 74 M | base.en | base | ~1 GB | ~16x |
small | 244 M | small.en | small | ~2 GB | ~6x |
medium | 769 M | medium.en | medium | ~5 GB | ~2x |
large | 1550 M | N/A | large | ~10 GB | 1x |
Command-line usage
The following command will transcribe speech in audio files, using the medium model:
1 | whisper audio.flac audio.mp3 audio.wav --model medium |
The default setting (which selects the small model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the –language option:
1 | whisper japanese.wav --language Japanese |
Adding --task translate
will translate the speech into English:
1 | whisper japanese.wav --language Japanese --task translate |
Run the following to view all available options:
1 | whisper --help |
See tokenizer.py for the list of all available languages.
Python usage
Transcription can also be performed within Python:
1 | import whisper |
Internally, the transcribe()
method reads the entire file and processes the audio with a sliding 30-second window, performing autoregressive sequence-to-sequence predictions on each window.
Below is an example usage of whisper.detect_language()
and whisper.decode()
which provide lower-level access to the model.
1 | import whisper |