Append to an existing conversation with speaker separation

Because conversations don’t always end on schedule and may resume later, our Async API enables you to append a new file to an existing conversation. You can read more about this capability in Process Audio > Append an audio file and Process Video > Append a video file.

To enable Speaker Separation with the append capability, the request structure is the same as shown previous links for creating a new Conversation. You need to pass enableSpeakerDiarization=true and diarizationSpeakerCount=<NUMBER_OF_UNIQUE_SPEAKERS> as query-parameters.

However, there is one caveat about how Automatic Speech Recognition works with Speaker Separation and appended files. Consider the following example.

Example scenario

You send a recorded conversation to the Async API with 2 speakers John and Alice with enableSpeakerDiarization=true. The diarization identifies them as Speaker 1 and Speaker 2 respectively. You then update the speakers with their email values as [email protected] and [email protected].

Then you append the call with another conversation including 2 speakers John and May using enableSpeakerDiarization=true. The diarization identifies them as Speaker 1 and Speaker 2 respectively. As previously mentioned, these numbers are arbitrary and have nothing to do with the order in which the speakers spoke in the conversation.

After this job is complete, you have 4 members in this conversation:

  1. John

  2. Alice

  3. Speaker 1 (Which is John again)

  4. Speaker 2 (Which is May)

Since John and Speaker 1 refer to the same speaker but are labeled as different speakers, their member references would be different for all messages and insights that they are a part of.

Merging speakers

This is where the email identifier comes in. You can use the Update members operation to identify and merge a member with the same email parameter. This replaces any duplicate references with a single reference across the entire conversation, applying to all the references in the members, messages and insights.

If you use the Update members operation with the following request where 74001a1d-4e9e-456a-84ed-81bbd363333a is the id of Speaker 1 from the previous scenario, this eliminates the extra member and updates all the references with member represented by 2f69f1c8-bf0a-48ef-b47f-95ae5a4de325 which we know as John Doe.

$ curl --location --request PUT "https://api.symbl.ai/v1/conversations/$CONVERSATION_ID/members/74001a1d-4e9e-456a-84ed-81bbd363333a"
       --header 'Content-Type: application/json'
       --header "Authorization: Bearer $AUTH_TOKEN"
       --data-raw '{
            "id": "74001a1d-4e9e-456a-84ed-81bbd363333a",
            "email": "[email protected]",
            "name": "John Doe"
        }'
const authToken = AUTH_TOKEN;
const conversationId = 'your_conversation_id'  // Generated using Submit text end point
const memberId = '74001a1d-4e9e-456a-84ed-81bbd363333a'  // MemberId of members fetched using fetchMember API
const url = `https://api.symbl.ai/v1/conversations/${conversationId}/members/${memberId}`;

payload = {
    'id': "74001a1d-4e9e-456a-84ed-81bbd363333a",  // Should be a valid UUID e.g. f170371e-d9db-4d55-9d49-a111a89cf078
    'email': "[email protected]",  // Should be a valid emailId e.g. [email protected]
    'name': "John Doe"  // Should be a valid string e.g. John
}

const responses = {
  400: 'Bad Request! Please refer docs for correct input fields.',
  401: 'Unauthorized. Please generate a new access token.',
  404: 'The conversation and/or it\'s metadata you asked could not be found, please check the input provided',
  429: 'Maximum number of concurrent jobs reached. Please wait for some requests to complete.',
  500: 'Something went wrong! Please contact [email protected]'
}

const fetchData = {
  method: "PUT",
  headers: {
    'Authorization': `Bearer ${authToken}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify(payload),
}

fetch(url, fetchData).then(response => {
  if (response.ok) {
    return response.json();
  } else {
    throw new Error(responses[response.status]);
  }
}).then(response => {
  console.log('response', response);
}).catch(error => {
  console.error(error);
});
import json
import requests

baseUrl = "https://api.symbl.ai/v1/conversations/{conversationId}/members/{memberId}"
conversationId = 'your_conversation_id'  # Generated using Submit text end point
memberId = '74001a1d-4e9e-456a-84ed-81bbd363333a'  # MemberId of members fetched using fetchMember API

url = baseUrl.format(conversationId=conversationId, memberId=memberId)

# set your access token here. See https://docs.symbl.ai/docs/developer-tools/authentication
access_token = 'your_access_token'

headers = {
    'Authorization': 'Bearer ' + access_token,
    'Content-Type': 'application/json'
}

payload = {
    'id': "74001a1d-4e9e-456a-84ed-81bbd363333a",  # Should be a valid UUID e.g. f170371e-d9db-4d55-9d49-a111a89cf078
    'email': "[email protected]",  # Should be a valid emailId e.g. [email protected]
    'name': "John Doe"  # Should be a valid string e.g. John
}

responses = {
    401: 'Unauthorized. Please generate a new access token.',
    404: 'The conversation and/or it\'s metadata you asked could not be found, please check the input provided',
    500: 'Something went wrong! Please contact [email protected]'
}

response = requests.request("PUT", url, headers=headers, data=json.dumps(payload))

if response.status_code == 200:
    # Successful API execution
    print(response.json()['message'])  # message containing status of response
elif response.status_code in responses.keys():
    print(responses[response.status_code])  # Expected error occurred
else:
    print("Unexpected error occurred. Please contact [email protected]" + ", Debug Message => " + str(response.text))

exit()

This update can be accomplished because the email uniquely identifies only one member.

Best practices

  • For best results, make sure the diarizationSpeakerCount is equal to the number of unique speakers present in the conversation. The Diarization model uses this number when processing the conversation. If this number is different from the actual number of speakers, it might introduce false positives for some part of the transcriptions.

  • For the best experience, the sample rate of the data should be greater than or equal to 16000Hz.