Speech to Text¶
New in version 21.04.0.
Warning
Speech to text does not work with version 21.04.2 due to Vosk API issues. Use version 21.04.1 or 21.04.3 and later versions.
Install Python¶
Python 3 needs to be installed on your computer as well as the vosk and srt python modules:
Linux¶
On most Linux distributions python is installed by default. You can check if that is the case for you too by running python3 -V
in a terminal. If python is missing just search the internet, there are lots of instructions around.
Windows¶
Download python from https://www.python.org/downloads/ for installation on your computer.
Speech Engines¶
VOSK¶
Linux
To install VOSK and srt open a terminal and run: pip3 install vosk srt
Windows
Download this batch file (Install_vosk_srt.zip
). After download a double click starts the installations.
Install a Language¶
Goto
and select the speech engine VOSK.Click on the link to get a language model.

Drag & drop the language you want from the vosk-model download page to the model window, and it will download and extract it for you.

If you have problems or check for updates click on the Check configuration button.
Whisper¶
New in version 23.04.
OpenAI-Whisper is a speech recognition model for general use. It is trained on a large dataset of diverse audio and is capable of performing speech translation, and language identification.
Whisper is slower than VOSK on CPU, but it is more accurate than VOSK. Whisper creates sentences with punctuation marks, even in Base mode.

When you switch to Whisper for the first time you have to install the missing dependencies first (about 2GB to download).

When all is correct configured, you get this screen.
Model Select the model. More details on the Whisper source code page (default: Base) .
Language Select the language if Autodetect is not accurate (default: Autodetect)
Device For compatibility purposes only CPU is available
Translate text to english This translates non-English text to English during recognition
You can check for updates by clicking on Check configuration
Speech recognition¶
Select the speech engine¶
New in version 23.04.
Enable
menu item.
Click on the Hamburger Menu and select Configure Speech Recognition. This brings you to Configure Speech to Text, select the engine and click OK.
Translate to english is only available with the Whisper speech engine. It translates non-English text to English during recognition.
Creating subtitle by speech recognition¶

Shown with the VOSK engine¶
Mark the timeline zone you want to recognize (adjust the blue line)
Click on the Speech recognition icon
Choose the language
Choose how the selected zone should be applied
Press on the Process button
The subtitle gets created and inserted automatically.
Note
Only timeline zone is implemented for now in automatic subtitles.
Remark to 4: The default is to analyze only the Timeline zone (all tracks) (the blue bar in the timeline ruler). Set the zone in the timeline to what you want to analyze (use I and O to set in and out points). Selected clips option analyses the selected clip only.
Creating clips by speech recognition¶
This is useful for interviews and other speech-related footage. Enable the
menu item.
Shown with the VOSK engine¶
Select a clip in the project bin.
If needed set in/out point in the clip monitor and enable Selected zone only selection box. This will only recognize the text inside the zone.
Choose the correct language.
Press the Start Recognition button.
Selecting the text you want to either.
Put into the timeline.
Save edited text as a new playlist.
Add a Bookmark. You can jump to these bookmarks in the timeline with the Alt + arrow shortcut or edit the bookmark by double click.
Delete the selection.
Here you can search in the text.
And navigate up or down in the text.
Silence detection¶
This works with the VOSK engine only.
Open the clip in the clip monitor and open the speech editor window (
) .Select your language or Speech Engines and download the model for it.
Then click Start Recognition button.
Once this is done, click on the time-code where no-speech is indicated and just hit the Delete key. Repeat the operation for all the parts you want to remove, including where someone says what you do not want to include in your final edit.
Once finished, make sure Selected zone only is disabled, click on the Save button on the lower left part of the speech editor window and after few seconds a new playlist is added in the Project Bin without silence and without the text you do not want.