Append to an existing conversation with speaker separation
Because conversations don’t always end on schedule and may resume later, our Async API enables you to append a new file to an existing conversation. You can read more about this capability in Process Audio > Append an audio file and Process Video > Append a video file.
To enable Speaker Separation with the append capability, the request structure is the same as Process Audio > Append an audio file and Process Video > Append a video file. You need to pass both enableSpeakerDiarization=true
and diarizationSpeakerCount=<NUMBER_OF_UNIQUE_SPEAKERS>
as request parameters.
However, there is one caveat about how Automatic Speech Recognition works with Speaker Separation and appended files. Consider the following example.
Example scenario
You send a recorded conversation to the Async API with 2 speakers John
and Alice
with enableSpeakerDiarization=true
and diarizationSpeakerCount=2
. The diarization identifies them as Speaker 1
and Speaker 2
respectively. You then update the speakers with their email
values as [email protected]
and [email protected]
.
Then you append the call with another conversation including 2 speakers John
and May
using enableSpeakerDiarization=true
and diarizationSpeakerCount=2
. The diarization identifies them as Speaker 1
and Speaker 2
respectively. As previously mentioned, these numbers are arbitrary and have nothing to do with the order in which the speakers spoke in the conversation.
After this job is complete, you have 4 members in this conversation:
-
John
-
Alice
-
Speaker 1
(Which isJohn
again) -
Speaker 2
(Which isMay
)
Since John
and Speaker 1
refer to the same speaker but are labeled as different speakers, their member
references would be different for all messages
and insights
that they are a part of.
Merging speakers
This is where the email
identifier comes in. You can use the Update members operation to identify and merge a member
with the same email
parameter. This replaces any duplicate references with a single reference across the entire conversation, applying to all the references in the members
, messages
and insights
.
If you use the Update members operation with the following request where 74001a1d-4e9e-456a-84ed-81bbd363333a
is the id
of Speaker 1
from the previous scenario, this eliminates the extra member
and updates all the references with member represented by 2f69f1c8-bf0a-48ef-b47f-95ae5a4de325
which we know as John Doe
.
$ curl --location --request PUT "https://api.symbl.ai/v1/conversations/$CONVERSATION_ID/members/74001a1d-4e9e-456a-84ed-81bbd363333a"
--header 'Content-Type: application/json'
--header "Authorization: Bearer $AUTH_TOKEN"
--data-raw '{
"id": "74001a1d-4e9e-456a-84ed-81bbd363333a",
"email": "[email protected]",
"name": "John Doe"
}'
const authToken = AUTH_TOKEN;
const conversationId = 'your_conversation_id' // Generated using Submit text end point
const memberId = '74001a1d-4e9e-456a-84ed-81bbd363333a' // MemberId of members fetched using fetchMember API
const url = `https://api.symbl.ai/v1/conversations/${conversationId}/members/${memberId}`;
payload = {
'id': "74001a1d-4e9e-456a-84ed-81bbd363333a", // Should be a valid UUID e.g. f170371e-d9db-4d55-9d49-a111a89cf078
'email': "[email protected]", // Should be a valid emailId e.g. [email protected]
'name': "John Doe" // Should be a valid string e.g. John
}
const responses = {
400: 'Bad Request! Please refer docs for correct input fields.',
401: 'Unauthorized. Please generate a new access token.',
404: 'The conversation and/or it\'s metadata you asked could not be found, please check the input provided',
429: 'Maximum number of concurrent jobs reached. Please wait for some requests to complete.',
500: 'Something went wrong! Please contact [email protected]'
}
const fetchData = {
method: "PUT",
headers: {
'Authorization': `Bearer ${authToken}`,
'Content-Type': 'application/json',
},
body: JSON.stringify(payload),
}
fetch(url, fetchData).then(response => {
if (response.ok) {
return response.json();
} else {
throw new Error(responses[response.status]);
}
}).then(response => {
console.log('response', response);
}).catch(error => {
console.error(error);
});
import json
import requests
baseUrl = "https://api.symbl.ai/v1/conversations/{conversationId}/members/{memberId}"
conversationId = 'your_conversation_id' # Generated using Submit text end point
memberId = '74001a1d-4e9e-456a-84ed-81bbd363333a' # MemberId of members fetched using fetchMember API
url = baseUrl.format(conversationId=conversationId, memberId=memberId)
# set your access token here. See https://docs.symbl.ai/docs/developer-tools/authentication
access_token = 'your_access_token'
headers = {
'Authorization': 'Bearer ' + access_token,
'Content-Type': 'application/json'
}
payload = {
'id': "74001a1d-4e9e-456a-84ed-81bbd363333a", # Should be a valid UUID e.g. f170371e-d9db-4d55-9d49-a111a89cf078
'email': "[email protected]", # Should be a valid emailId e.g. [email protected]
'name': "John Doe" # Should be a valid string e.g. John
}
responses = {
401: 'Unauthorized. Please generate a new access token.',
404: 'The conversation and/or it\'s metadata you asked could not be found, please check the input provided',
500: 'Something went wrong! Please contact [email protected]'
}
response = requests.request("PUT", url, headers=headers, data=json.dumps(payload))
if response.status_code == 200:
# Successful API execution
print(response.json()['message']) # message containing status of response
elif response.status_code in responses.keys():
print(responses[response.status_code]) # Expected error occurred
else:
print("Unexpected error occurred. Please contact [email protected]" + ", Debug Message => " + str(response.text))
exit()
This update can be accomplished because the email
uniquely identifies only one member.
Best practices
-
For best results, make sure the
diarizationSpeakerCount
is equal to the number of unique speakers present in the conversation. The Diarization model uses this number when processing the conversation. If this number is different from the actual number of speakers, it might introduce false positives for some part of the transcriptions. -
For the best experience, the sample rate of the data should be greater than or equal to
16000Hz
.
Updated over 1 year ago