Enable the Speaker Diarization (Speaker Separation) for the Async Audio or Async Video APIs to get speaker separated transcripts and insights.
Enabling Speaker Separation in the Async Audio/Video API is as simple as adding the
diarizationSpeakerCount=<NUMBER_OF_UNIQUE_SPEAKERS> query-parameters below:
The above snippet shows a cURL command for consuming the Async Video URL based API which takes in the url for a publicly available URL of a Video File.
<X-API-KEY>needs to be replaced with the token generated token:generate endpoint.
<WEBHOOK_URL>can be replaced with a WebHook URL for receiving the status for the Job created after calling the API.
For accuracy, the
<NUMBER_OF_UNIQUE_SPEAKERS>should match the number of unique speakers in the Audio/Video data.
The above URL has two query-parameters:
enableSpeakerDiarization=truewhich will enable the speaker separation for the Audio/Video data under consideration.
diarizationSpeakerCount=2which sets the number of unique speakers in the Audio/Video data under consideration.
The above example uses the Async Video URL API to consume the Diarization capability, but the Diarization can be achieved with other Async Audio/Video APIs in the same way.
members call in the Conversation API will return the uniquely identified speakers for this conversation when Speaker Diarization is enabled. View a sample output below:
name assigned to a uniquely identified speaker/member from a Diarized Audio/Video will follow the format
Speaker <number> where
<number> is arbitrary and does not necessarily reflect in what order someone spoke.
id can be used to identify a speaker/member for that specific conversation and can be used to update the details for the specific member demonstrated below in the Updating Detected Members section.
messages call in the Conversation API would return the speaker separated results. View a snippet for the above URL below:
The above snippet shows the speaker in the
from object with a unique-id. These are the uniquely identified
members of this conversation.
Reminder: The speaker number in the above snippet is arbitrary and the number doesn’t necessarily reflect the order in which someone spoke.
Similarly, invoking the
insights call in the Conversation API would also reflect the identified speakers in the detected insights. The response below demonstrates this:
The detected members (unique speakers) would have names like
Speaker 1 as the ASR wouldn’t have any context to who this speaker is (name or other details of the speaker). Therefore, it is important to update the details of the detected speakers after the
members call in the Conversation API returns the uniquely identified speakers as shown in the Getting the uniquely identified speakers (Members) section above when the Diarization is enabled.
Let’s consider the same set of members that can be retrieved by calling the GET members call in the Conversation API.
We can now use the
PUT members call to update the details of a specific member as shown below. This call would update the
Speaker 2 as shown in the above section with the values in the cURL’s
<CONVERSATION_ID>needs to be replaced with the actual
<X-API-KEY>needs to be replaced with the token generated with the
The URL in line 1 above has the
id of the
member we want to append to
/members with the request body containing the updated
name of this
There is also the option to include the
After the above call is successful, we will receive the following response:
message is self-explanatory and tells us that all the references to the
member with the
2f69f1c8-bf0a-48ef-b47f-95ae5a4de325 in the conversation should now reflect the new values we updated this
member with. That includes
messages and the conversation’s
members as well.
So if we call the
members API now, we would see the following result:
And similarly, with the
messages API call, we would see the updates reflected below as well:
Curious about the
insights API? It would reflect these updates as well!
Because conversations don’t neatly end at once and may resume later, our Async API allows you to update/append an existing conversation. You can read more about this capability here.
To enable Diarization with the append capability, the request structure is the same as shown above for creating a new Conversation. You would need to pass in
However, there is one caveat in how the ASR works with Diarization. Consider the below:
We send a recorded conversation to the Async API with 2 speakers
enableSpeakerDiarization=true. The diarization identifies them as
Speaker 1 and
Speaker 2 respectively. We then update the above speakers with their
Now we use the append call for appending another conversation with 2 speakers
enableSpeakerDiarization=true. Let’s assume that the diarization would now identify these as
Speaker 1 and
Speaker 2 respectively. As discussed before, these numbers are arbitrary and have nothing to do with the order in which the speakers spoke in the conversation.
After this job is complete we will have 4 members in this conversation:
Speaker 1(Which is
Speaker 2(Which is
Speaker 1 refer to the same speaker but are labeled as different speakers, their
member references would be different for all
insights that they are a part of.
This is where the
PUT members call can uniquely identify and merge a
member with the same
If we were to execute a
PUT members call with the below body where
74001a1d-4e9e-456a-84ed-81bbd363333a refers to the
Speaker 1 from the above scenario, this would eliminate this
member and would update all the references with member represented by
2f69f1c8-bf0a-48ef-b47f-95ae5a4de325 which we know is
This is possible because the
diarizationSpeakerCountshould be equal to the number of unique speakers present in the conversation for best results as the Diarization model uses this number to probabilistically determine the speakers. If this number is different than the actual speakers, then it might introduce false-positives for some part of the transcriptions.
For the best experience, the Sample Rate of the data should be greater than or equal to