Skip to main content


Symbl offers state-of-the-art Speech-to-Text capability (also called transcription). You can convert audio and video conversations into text in real-time or after the conversation has ended.

Key Features

  • Real-time transcripting: Transcribe your content from real-time and stored files.

  • Domain specific: Symbl's recognizes Speech-to-Text models for mobile call and video calls for state-of-the-art accuracy.

  • Multi-language Support: We support 20+ languages including English, Russian, French, Italian, Hindi, Japanese, Spanish, etc. We also support models for different accents. For example, the way American and British English are spoken are different and we have Speech Recognition Models that are fine-tuned for different accents.
    Languages Supported

  • Custom Vocabulary: We support Custom Vocabulary which help Speech-to-Text recognize specific words or phrases that are more frequently used within a context. For example, suppose that your audio data often includes the word "sell". When Speech-to-Text encounters the word "sell," you want it to transcribe the word as "sell" more often than "cell." In this case, you might use speech adaptation to bias Speech-to-Text toward recognizing "sell."

  • Accurate Punctuation: Speech-to-Text accurately punctuates transcriptions (e.g., commas, question marks, and periods).

  • Speaker Diarization: Know who said what by receiving automatic predictions about which of the speakers in a conversation spoke each utterance this is called Speaker Diarization. This process is fairly accurate but not 100% accurate. If you want near 100% accuracy with who said what, please use audio streams and passing the audio files in channels.

  • Paragraph generation

  • Support for formats like Markdown (.md) and SubRip Text (.srt)

  • Action Phrases within the transcription

Speech-to-Text API


Each continuous sentence spoken by a speaker in a conversation is referred to as a Message. Hence, we named our Speech to Text API to Messages API. Messages API returns you a list of messages in a conversation.

To see Messages API in action, you need to process a conversation using Symbl. After you process a meeting, you'll receive a Conversation ID. A Conversation ID is the key to receiving conversational insights from any conversation. As an example, here's a simple API call which grabs the speech-to-text transcription from the conversation.

Using the conversation API, you can get a pre-formatted transcript in markdown language or in standard transcription or closed captioning format like SRT. See Formatted Transcript section for more.

👉 Messages API

Grab speech-to-text transcription

Remember to replace the conversationId in the API call with the Conversation ID you get from the previous API call.

curl "{conversationId}/messages" \
-H "Authorization: Bearer $AUTH_TOKEN"

Our customers love our Speech to Text! ❤️