SAMMI Extensions
Speech to Text
Turn your speech into text effortlessly with this Speech To Text extension!
Supported Engines
Google Cloud
Google Cloud's free tier allows you to transcribe 60 minutes of audio completely free each month.
Pricing Info / Supported Languages
OpenAI
OpenAI provides high-quality speech-to-text capabilities. Currently, OpenAI does not provide a free tier.
Pricing Info (under Audio models - Whisper) / Supported Languages
Microsoft Azure
Azure's free tier allows you to transcribe 5 hours of audio completely free each month.
Pricing Info / Supported Languages
Features
Language Selection
Easily select the language you want to transcribe in, for better transcription accuracy.
Profanity Filter
Some engines offer additional features like a profanity filter for cleaner transcriptions.
Auto Stop
Configure the extension to automatically stop transcribing when silence is detected.
Usage Logging
Keep track of your usage statistics with the built-in logging feature.
Know Before You Use
- The extension is not intended to be used for live captioning, but rather for one-time Speech to Text requests, similar to how 'Ok Google' or 'Hey Alexa' works.
- The extension works best with Bridge running within OBS dock. Performance outside of OBS cannot be guaranteed.
- You'll need a credit card to use any of these services.
Setup
Install the extension. You can follow the Extension Install Guide.
Add the
--use-fake-ui-for-media-stream
flag to your OBS executable (if Bridge is running as a dock in OBS):- Navigate to where your OBS shortcut or
obs64.exe
is located. - If you're using
obs64.exe
, right-click and choose "Create Shortcut". - Modify Properties: Right-click on the shortcut, select "Properties".
- Add Flag: In the Target field, add
--use-fake-ui-for-media-stream
after the path. - Save the changes. Launch OBS from this shortcut going forward.
- Navigate to where your OBS shortcut or
Navigate to the premade deck and open the Settings button:
- General Settings
- Default Engine: The engine to use for all queries by default.
- Silence Length: Time (in seconds) of silence before stopping transcription automatically, if Auto Stop is enabled.
- Silence Threshold: Define what level of noise is considered silence. Increase the value for noisier settings.
- Log Usage: Track usage with Get and Reset Usage commands. It's strongly advised to set up billing alerts to avoid unexpected charges.
- Enable Debug: Enables console logging to see what's the average sound level. Helpful for adjusting silence threshold. Enable this to see Bridge console logs when STT is active and 'Auto Stop' is enabled.
- General Settings
Ensure your recording device is set correctly (available if Bridge is running inside OBS dock).
- Go to the Bridge - STT by K tab and select a recording device, if necessary.
Continue setting up the desired engine inside the Settings button.
Microphone Access
If you're running Bridge outside of OBS (not recommended), go to the Bridge - STT by K tab and click 'Allow Recording' to grant microphone access every time you refresh Bridge or relaunch SAMMI.
Available Engines
Google Cloud
- Free 60 minutes/month. Monitor usage under Google Cloud - Billing - Overview.
- Strongly advised: Configure alerts at Google Cloud - Billing - Budgets & Alerts.
Settings (accessed via Settings button):
- Google Cloud API Key - Your Google Cloud API Key with Text to Speech API enabled
- Language - Transcription language
- Profanity Filter - Attempts to filter out profanities, replaces all but the initial character in each filtered word with ****
- Enable Punctuation - Adds punctuation to results (only in select languages)
- Enable Emoji - Converts spoken emojis to Unicode symbols in the text
How to create a Google Cloud account and API key:
Log in or sign up at Google Cloud
Watch the setup video below. At
0:50
, enable Cloud Speech-to-Text API instead. Ignore steps after1:50
.
OpenAI
- No free tier. Monitor usage at OpenAI Dashboard.
- Strongly advised: Set usage limits.
Settings: (accessed via Settings button):
- OpenAI API Key - Find yours at OpenAI platform
- Language - Transcription language
How to create an OpenAI account and API key:
- Log in or sign up at OpenAI
- Watch the setup video below.
- Don't forget to set up a payment method under Billing - Payment methods.
Microsoft Azure
- Free 5 hours/month. Monitor usage at Azure Portal.
- Strongly advised: Set up a budget at Azure Cost Management.
Settings: (accessed via Settings button):
- Azure API Key - Azure API key for the Resource that's configured for SpeechServices in your Azure Portal
- Azure region - Azure region for the Resource that's configured for SpeechServices
- Language - Transcription language
- Profanity Filter - Specify how to handle profanity in transcriptions: - masked - replaces profanity with asterisks - removed - removes all profanity from the result. - raw - includes profanity in the result.
How to create an Azure account and API key:
Log in or sign up at Azure Portal
Setup your billing account at Cost Management + Billing
Watch the video below:
Azure Note
When creating the new resource as shown in the video, create or use an existing Resource Group, and select region closest to your location.
Transcribing
To record and transcribe speech using your microphone, use the STT by K Transcribe command. You can start, stop, or cancel the recording as needed. The transcription will be saved in the variable name you specify in the Start action.
Time limits:
- Google Cloud: Up to 1 minute per transcription
- OpenAI: Up to 2 minutes per transcription
- Azure: Up to 1 minute per transcription
Box Name | Description |
---|---|
Action | Start - begin recording your voice to transcribe <br/> Stop - end recording and send the audio to be transcribed <br/> Cancel - stop recording without saving or transcribing the audio |
Engine | Use the default engine from Settings, or select a specific one |
Stop Automatically | Stops recording automatically when no sound is detected. You can change the silence level and amount of seconds in Settings. |
Save Variable As (status) | current status, can be one of the following values: <br/> listening - actively listening to you speaking <br/> processing - processing the recorded speech, not listening anymore <br/> ok - speech processed and saved in the Save Variable As (result) <br/> error - something went wrong |
Save Variable As (result) | Variable name to save the transcription result into. This is only used for the 'Start' action. Will be saved as an empty string if there's an error. |
Automatic Stop Warning
If 'Automatic Stop' isn't set up, and you do not use the 'Stop' action, the recording will be cancelled when it exceeds the time limit.
Getting and Resetting Usage
Use the STT by K usage command to get or reset usage statistics for all engines. Resetting is useful at the end of a billing month.
Usage Logging
To enable usage logging, see General Settings inside the Settings button.
Debugging 'Auto Stop' Feature
If you're having issues with the 'Auto Stop' feature, such as the transcription stopping too early, too late or not at all, you can enable the 'Enable Debug' setting in the General Settings. This will log the average sound level in the console, which can help you adjust the silence threshold.
How to see the logs:
- Open Bridge in a regular browser.
- Press F12 to open the Developer Tools.
- Go to the Console tab.
- Press Transcribe START button with 'Auto Stop' enabled, and start speaking.
- Observe the average sound level in the console. Adjust the Silence Threshold in the General Settings accordingly.
Get Help
Please see Troubleshooting for common extensions issues.