Speaker separation

ASR speaker separation is a method of distinguishing between different speakers in an audio stream. Speaker separation enables you to understand who said what in a transcript and perform speaker level analysis when using conversation intelligence.

Symbl.ai provides two options for speaker separation: automatic diarization and multi-channel conversation processing.

Use Cases

  • If you have recordings with one channel and no speaker timestamps, you can use automatic speaker diarization to get speaker separated transcripts. For example, call centers can use automatic speaker diarization to distinguish between agent and customer to get accurate transcripts and conversation insights.

  • If you have recorded conversations with two or more separate channels, use multi-channel speaker separation. This is the most efficient and accurate method of async speaker separation. You only need to identify the source file as channel separated and provide speaker information per channel.

Feature availability

Speaker separation is currently available for recorded (async) conversations:

FeatureAsyncStreamingTelephony
Automatic speaker diarization
Multi-channel supportN/AN/A

Automatic speaker diarization

A method of speaker separation that automatically detects speakers and assigns each message in the transcript to the correct speaker.

Use automatic speaker diarization if you have the file without separate channels or no active timeline of speaker events. Process a recorded file and apply speaker separation. When you Submit audio file or Submit video file without speaker-separated channels, include the parameters enableSpeakerDiarization set to true and the number of participants in the call using diarizationSpeakerCount.

After processing a conversation you can also update speaker metadata using the Update speakers operation.

Note: Automatic speaker diarization is only supported for en-US language code.

Multi-channel speaker separation

Achieve speaker separation by generating a transcript for each channel separately when you process a recorded conversation file with multiple channels. This method of speaker separation is the most accurate async option, but it requires a recorded file that already uses a separate channel per speaker.

For a recorded file that is already in separate channels, use the Async Audio API as described in process audio or process video. When you Submit audio file or Submit video file with speaker-separated channels, include the parameters enableSeparateRecognitionPerChannel set to true and channelMetadata with the speaker’s details per channel.

Note that if you have _both _features enabled, Automatic speaker diarization takes precedence over Multi-channel speaker separation.

Next steps