Automatic Speech Recognition

Use Symbl.ai Async Automatic Speech Recognition (ASR) to quickly and accurately convert audio and video conversations to text asynchronously or in batch. You can use our Async ASR to transcribe conversations in multiple languages and customize it for domain specific keywords using custom vocabulary. The Async ASR latency stands at ~5% of the length of the call.

For more information on how to process conversations with Symbl.ai, see Process a conversation.

Use cases

A few of the many use cases for Symbl.ai ASR:

Closed Captioning – Add closed captions to recordings and live audio and video streams. Increase accessibility to products and comply with regulatory standards that require closed captions and record keeping.
Searchable Media Library – Enable text-based search for audio and video content using transcripts.
Search Engine Optimization (SEO) – Improve SEO by using transcripts to tag audio and video content.
Transcription – Save time and money by automatically transcribing conversations. Apply ASR transcription to situations in which it is costly to hire human transcribers.

Features

Symbl.ai ASR includes out-of-the-box support for the following features:

Formatting:
- Redact PII, PCI, and PHI data – Choose to redact one or all categories of PII, PCI, and PHI data from conversations.
- Redact profanity – Automatically redact profane phrases from conversations.
- Remove filler words – Remove small meaningless words such as um, ah, and hmm from conversations.
- Punctuation – Improve readability of transcripts by adding all discernible punctuation, including questions, pauses, and full stops.
- Inverse Text Normalization – Convert spoken numbers such as date and time, addresses, and currency amounts from words to numerical values. For example, “one thousand five-hundred” becomes 1,500 and “one hundred and twenty-five dollars” becomes $125.00. Also known as numerical formatting.
- Transcription output format – The default output format is JSON. You can convert this to Markdown or SubRip Subtitle (SRT) format.
Multi-language support – Get transcription for any of the supported languages.
Speaker separation:
- Automatic speaker diarization – Automatically detect speakers and assign each message in the transcript to the right speaker.
- Multiple channels – Get transcription for each channel separately when you process a recorded conversation file with multiple channels.
Custom vocabulary (Keyword Boosting) – Bias the ASR to recognize particular domain specific terms that would otherwise not be detected by the general model. For example, you can provide people's names and brand names that are specific to your business.
Speaker analytics:
- Pace – The speed at which the person spoke, in words per minute (WPM). Also called speaker speech speed.
- Silence time – The time during which none of the speakers said anything.
- Speaker overlap – Shows if a speaker spoke over another speaker, provided as a percentage of total conversation and overlap time in seconds.
- Speaker ratio – The total ratio of talk time for one speaker compared to others in the same conversation.
- Talk time – The amount of time each person spoke during the conversation.
Bookmarks – Highlight and summarize key moments from conversations. Quickly get to key moments of a conversation and share those moments with others.
Confidence score – An estimate of the reliability of a detected word and message.
Word level timestamps – The start and end time of each word in Coordinated Universal Time (UTC) format.
Sentence level timestamps – The start and end time of each sentence in Coordinated Universal Time (UTC) format.

Benchmarks

Below is a comparative performance analysis with WER (Word-Error Rate) of various ASR providers across different datasets:

Dataset	Symbl.ai Async ASR	Azure	Assembly	Rev	Deepgram
Mixed Domain - Internal	13.65	13.81	14.35	14.24	14.96
Kincaid	14.43	16.13	13.49	14.48	18.93
Poor Audio CC	19.35	19.91	28.27	28.36	34.35

Supported languages for speech to text transcription

Multi-language support for recorded (async) conversations applies to speech-to-text transcription.

Supported Languages	Code
English (United States)	`en-US`
English (United Kingdom)	`en-GB`
English (Australia)	`en-AU`
English (Ireland)	`en-IE`
English (India)	`en-IN`
English (South Africa)	`en-ZA`
English (New Zealand)	`en-NZ`
Russian (Russian Federation)	`ru-RU`
French (Canada)	`fr-CA`
French (France)	`fr-FR`
French (Luxembourg)	`fr-LU`
French (Switzerland)	`fr-CH`
German (Germany)	`de-DE`
German (Austria)	`de-AT`
German (Belgium)	`de-BE`
German (Luxembourg)	`de-LU`
German (Switzerland)	`de-CH`
Italian (Italy)	`it-IT`
Italian (Switzerland)	`it-CH`
Dutch (Netherlands)	`nl-NL`
Japanese (Japan)	`ja-JP`
Spanish (United States)	`es-US`
Spanish (Spain)	`es-ES`
Arabic (Saudi Arabia)	`ar-SA`
Hindi (India)	`hi-IN`
Portuguese (Brazil)	`pt-BR`
Portuguese (Portugal)	`pt-PT`
Persian (Iran)	`fa-IR`

Conversation intelligence

Symbl.ai not only provides ASR-generated transcripts, but also the conversation intelligence to get actionable insights from your conversations. After processing a conversation, you can use our Conversations API to get a wide range of conversation intelligence.

Next steps

Get started – Quickly find and start using the Symbl.ai tools and technologies that meet your needs.
Process a conversation – Submit an async, streaming, or telephony conversation to receive a conversation ID.
Get messages – With a conversation ID, you can generate a transcript including your choice of insights.
Conversation intelligence – Use your completed transcript to access all conversation intelligence features.