Speech to Text - SAMMI Extension | Christina Kral's Docs

Turn your speech into text effortlessly with this Speech To Text extension!

Supported Engines

Google Cloud

Google Cloud's free tier allows you to transcribe 60 minutes of audio completely free each month.

Pricing Info / Supported Languages

OpenAI

OpenAI provides high-quality speech-to-text capabilities. Currently, OpenAI does not provide a free tier.

Pricing Info (under Audio models - Whisper) / Supported Languages

Microsoft Azure

Azure's free tier allows you to transcribe 5 hours of audio completely free each month.

Pricing Info / Supported Languages

Features

Language Selection

Easily select the language you want to transcribe in, for better transcription accuracy.

Profanity Filter

Some engines offer additional features like a profanity filter for cleaner transcriptions.

Auto Stop

Configure the extension to automatically stop transcribing when silence is detected.

Usage Logging

Keep track of your usage statistics with the built-in logging feature.

Know Before You Use

The extension is not intended to be used for live captioning, but rather for one-time Speech to Text requests, similar to how 'Ok Google' or 'Hey Alexa' works.
The extension works best with Bridge running within OBS dock. Performance outside of OBS cannot be guaranteed.
You'll need a credit card to use any of these services.

Setup

Install the extension. You can follow the Extension Install Guide.
Add the --use-fake-ui-for-media-stream flag to your OBS executable (if Bridge is running as a dock in OBS):
1. Navigate to where your OBS shortcut or obs64.exe is located.
2. If you're using obs64.exe, right-click and choose "Create Shortcut".
3. Modify Properties: Right-click on the shortcut, select "Properties".
4. Add Flag: In the Target field, add --use-fake-ui-for-media-stream after the path.
5. Save the changes. Launch OBS from this shortcut going forward.
Navigate to the premade deck and open the Settings button:
- General Settings
  - Default Engine: The engine to use for all queries by default.
  - Silence Length: Time (in seconds) of silence before stopping transcription automatically, if Auto Stop is enabled.
  - Silence Threshold: Define what level of noise is considered silence. Increase the value for noisier settings.
  - Log Usage: Track usage with Get and Reset Usage commands. It's strongly advised to set up billing alerts to avoid unexpected charges.
  - Enable Debug: Enables console logging to see what's the average sound level. Helpful for adjusting silence threshold. Enable this to see Bridge console logs when STT is active and 'Auto Stop' is enabled.
Ensure your recording device is set correctly (available if Bridge is running inside OBS dock).
- Go to the Bridge - STT by K tab and select a recording device, if necessary.
Continue setting up the desired engine inside the Settings button.

Microphone Access

If you're running Bridge outside of OBS (not recommended), go to the Bridge - STT by K tab and click 'Allow Recording' to grant microphone access every time you refresh Bridge or relaunch SAMMI.

Available Engines

Google Cloud

Free 60 minutes/month. Monitor usage under Google Cloud - Billing - Overview.
Strongly advised: Configure alerts at Google Cloud - Billing - Budgets & Alerts.

Settings (accessed via Settings button):

Google Cloud API Key - Your Google Cloud API Key with Text to Speech API enabled
Language - Transcription language
Profanity Filter - Attempts to filter out profanities, replaces all but the initial character in each filtered word with ****
Enable Punctuation - Adds punctuation to results (only in select languages)
Enable Emoji - Converts spoken emojis to Unicode symbols in the text

How to create a Google Cloud account and API key:

Log in or sign up at Google Cloud
Watch the setup video below. At 0:50, enable Cloud Speech-to-Text API instead. Ignore steps after 1:50.

OpenAI

No free tier. Monitor usage at OpenAI Dashboard.
Strongly advised: Set usage limits.

Settings: (accessed via Settings button):

OpenAI API Key - Find yours at OpenAI platform
Language - Transcription language

How to create an OpenAI account and API key:

Log in or sign up at OpenAI
Watch the setup video below.
Don't forget to set up a payment method under Billing - Payment methods.

Microsoft Azure

Free 5 hours/month. Monitor usage at Azure Portal.
Strongly advised: Set up a budget at Azure Cost Management.

Settings: (accessed via Settings button):

Azure API Key - Azure API key for the Resource that's configured for SpeechServices in your Azure Portal
Azure region - Azure region for the Resource that's configured for SpeechServices
Language - Transcription language
Profanity Filter - Specify how to handle profanity in transcriptions: - masked - replaces profanity with asterisks - removed - removes all profanity from the result. - raw - includes profanity in the result.

How to create an Azure account and API key:

Log in or sign up at Azure Portal
Setup your billing account at Cost Management + Billing
Watch the video below:
Azure Note
When creating the new resource as shown in the video, create or use an existing Resource Group, and select region closest to your location.

Transcribing

To record and transcribe speech using your microphone, use the STT by K Transcribe command. You can start, stop, or cancel the recording as needed. The transcription will be saved in the variable name you specify in the Start action.

Time limits:

Google Cloud: Up to 1 minute per transcription
OpenAI: Up to 2 minutes per transcription
Azure: Up to 1 minute per transcription

Box Name	Description
Action	Start - begin recording your voice to transcribe <br/> Stop - end recording and send the audio to be transcribed <br/> Cancel - stop recording without saving or transcribing the audio
Engine	Use the default engine from Settings, or select a specific one
Stop Automatically	Stops recording automatically when no sound is detected. You can change the silence level and amount of seconds in Settings.
Save Variable As (status)	current status, can be one of the following values: <br/> `listening` - actively listening to you speaking <br/> `processing` - processing the recorded speech, not listening anymore <br/> `ok` - speech processed and saved in the Save Variable As (result) <br/> `error` - something went wrong
Save Variable As (result)	Variable name to save the transcription result into. This is only used for the 'Start' action. Will be saved as an empty string if there's an error.

Automatic Stop Warning

If 'Automatic Stop' isn't set up, and you do not use the 'Stop' action, the recording will be cancelled when it exceeds the time limit.

Getting and Resetting Usage

Use the STT by K usage command to get or reset usage statistics for all engines. Resetting is useful at the end of a billing month.

Usage Logging

To enable usage logging, see General Settings inside the Settings button.

Debugging 'Auto Stop' Feature

If you're having issues with the 'Auto Stop' feature, such as the transcription stopping too early, too late or not at all, you can enable the 'Enable Debug' setting in the General Settings. This will log the average sound level in the console, which can help you adjust the silence threshold.

How to see the logs:

Open Bridge in a regular browser.
Press F12 to open the Developer Tools.
Go to the Console tab.
Press Transcribe START button with 'Auto Stop' enabled, and start speaking.
Observe the average sound level in the console. Adjust the Silence Threshold in the General Settings accordingly.

Get Help

Please see Troubleshooting for common extensions issues.