Real Time API
The realtime API uses the WebSocket protocol to enable interactive two-way communication between the agent/customer and Symbl.ai servers. By leveraging WebSockets, there's no need to poll the server for updates — events are streamed directly to the client as the conversation is processed in real time.
Authentication
Start by generating an access token using your appId
and appSecret
from the Symbl.ai platform.
curl --location 'https://api.symbl.ai/oauth2/token:generate' \
--header 'Content-Type: application/json' \
--data '
{
"type": "application",
"appId": "your appId",
"appSecret": "your appSecret"
}'
This request will return an access token, which is required for all subsequent API calls.
Connection
Real-time Assist API URL
Use the following URL when you open a WebSocket connection to the Real-time Assist API:
wss://api.symbl.ai/v1/realtime/assist/{REALTIME_SESSION_ID}?access_token={ACCESS_TOKEN}
Where:
ACCESS_TOKEN
is an access token that you generate with your app ID and secret.REALTIME_SESSION_ID
is a unique identifier for the session, which you should generate (e.g., using UUID).
Message Reference
The WebSocket protocol supports sending text messages in JSON format and binary messages for communicating various data. The Real-time Assist API utilizes text messages to deliver conversation intelligence and transcription, and binary messages to receive and process audio.
Start Request
Use the start_request
message to initiate a real-time assist session. The message includes configuration information for the conversation and speaker.
{
"type": "start_request",
"id": string,
"RTAId": string,
"config": {
"speechRecognition": {
"sampleRateHertz": 44100
}
},
"speaker": {
"email": string,
"userId": string,
"name": string,
"role": string
},
"assistants": ["objection-handling"]
}
Field descriptions
type
: Required string. Must be "start_request".id
: Required string. The unique session ID you generated.RTAId
: Required string. The ID of the Real-time Assist configuration you want to use.config
: Optional object. Contains configuration settings.speechRecognition
: Optional object. Contains audio configuration.sampleRateHertz
: Optional number. The sample rate of the audio stream.
speaker
: Required object. Contains information about the speaker.email
: Required string. The speaker's email address.userId
: Required string. A unique identifier for the speaker.name
: Required string. The speaker's name.role
: Required string. The speaker's role (e.g., "agent" or "customer").
assistants
: Required array. Contains the types of assistants to enable (e.g., ["objection-handling"]).
Responses
The Real-time Assist API sends various responses during the session. Each response includes a type field and additional data based on the type. Here are some of the response types:
message
: Contains various subtypes of messages.transcript_response
: Contains the transcript of the conversation.objection_response
: Contains information about detected objections.
Message Response
{
"type": "message",
"message": {
"type": string,
"isFinal": boolean,
"payload": {
"raw": {
"alternatives": [
{
"words": [
{
"word": string,
"startTime": {
"seconds": string,
"nanos": string
},
"endTime": {
"seconds": string,
"nanos": string
}
}
],
"transcript": string,
"confidence": number
}
]
}
},
"punctuated": {
"transcript": string
},
"user": {
"userId": string,
"name": string,
"email": string,
"role": string
}
},
"timeOffset": number
}
Field descriptions
type
: Always "message" for this type of response.message
: An object containing the details of the message.type
: The specific type of message. Common values include:recognition_started
: Indicates that the system has started recognizing speech.recognition_result
: Contains interim or final results of speech recognition.recognition_stopped
: Indicates that speech recognition has stopped.
isFinal
: A boolean indicating whether this is a final result (true) or an interim result (false).payload
: Contains the raw recognition data.raw
: The unprocessed recognition data.alternatives
: An array of possible interpretations of the speech.words
: An array of recognized words with their timing information.transcript
: The full text of this recognition alternative.confidence
: A number between 0 and 1 indicating the confidence in this alternative.
punctuated
: Contains a punctuated version of the transcript.user
: Information about the user who is speaking.userId
: A unique identifier for the user.name
: The name of the user.email
: The email address of the user.role
: The role of the user (e.g., "agent" or "customer").
timeOffset
: The time offset of this message in milliseconds from the start of the session.
This structure allows for real-time updates on the recognized speech, including interim results that can be refined as more context becomes available. The isFinal
flag is particularly useful for determining when a segment of speech has been fully processed and can be considered complete.
Transcript Response
{
"type": "transcript_response",
"payload": {
"transcript": string,
"user": {
"email": string,
"userId": string,
"name": string,
"role": string
}
}
}
Objection Response
{
"type": "objection_response",
"payload": {
"name": string,
"message": string
}
}
type
: Always "objection_response
" for this type of message.payload
: An object containing details about the detected objection.`name
: A string identifying the type or category of the objection detected. This could be something like "price_objection", "value_objection", "timing_objection", etc.message
: A string containing the exact words spoken by the customer that triggered the objection detection. This helps in understanding the context of the objection.
Sending Audio
After receiving a recognition_started message, you can start sending audio data as binary messages. Each chunk of audio should be no larger than 8,192 bytes.
Sample request
Here's a basic example of how to use the Real-time Assist API with Node.js:
const WebSocketClient = require('websocket').client;
const uuid = require('uuid').v4;
const mic = require('mic');
// Set up microphone
const micInstance = mic({
rate: '44100',
channels: '1',
debug: false,
exitOnSilence: 30,
});
const micInputStream = micInstance.getAudioStream();
// Set up user and session
const user = {
email: "[[email protected]](mailto:[email protected])",
userId: "[[email protected]](mailto:[email protected])",
name: "Example User",
role: "agent"
};
const realtimeSessionId = uuid();
// Connect to WebSocket
const client = new WebSocketClient();
client.connect(
`ws://104.198.213.217/v1/realtime/assist/${realtimeSessionId}?access_token=${YOUR_ACCESS_TOKEN}`,
'ws',
null,
null,
{ 'x-api-key': YOUR_ACCESS_TOKEN, rejectUnauthorized: false }
);
client.on('connect', function(connection) {
console.log('WebSocket Connected');
// Start microphone
micInstance.start();
// Send start request
connection.send(JSON.stringify({
type: 'start_request',
id: realtimeSessionId,
RTAId: "YOUR_RTA_ID",
config: {
speechRecognition: {
sampleRateHertz: 44100,
},
},
speaker: user,
assistants: ["objection-handling"],
}));
// Handle messages
connection.on('message', function(message) {
if (message.type === 'utf8') {
const msg = JSON.parse(message.utf8Data);
console.log(msg);
if (msg.message && msg.message.type === 'recognition_started') {
// Start sending audio data
micInputStream.on('data', (data) => {
connection.send(data);
});
}
}
});
Requests
In addition to the start_request
, you can also send a stop_request
to end the Real-time Assist session.
Stop Request
Use the stop_request
message to stop audio recognition and end a real-time conversation.
{
"type": "stop_request"
}
Field description
- type: Required string. The value must be "
stop_request
".
When you send a stop_request, the system will stop processing audio input and finalize any pending analyses. After sending this request, you should expect to receive a final set of responses, which may include:
- A
recognition_stopped
message indicating that speech recognition has ended. - Any final
transcript_response
messages for speech that was being processed. - Any final
objection_response
messages for objections that were detected but not yet reported. - A
conversation_completed
message indicating that the entire conversation has been processed and the session is closed
After receiving the conversation_completed
message, you should close the WebSocket connection.
Sample request
// Assuming 'connection' is your WebSocket connection object
connection.send(JSON.stringify({
type: "stop_request"
}));
// Handle the responses
connection.on('message', function(message) {
if (message.type === 'utf8') {
const msg = JSON.parse(message.utf8Data);
if (msg.type === 'message' && msg.message.type === 'conversation_completed') {
console.log('Conversation completed, closing connection');
connection.close();
}
}
});
Remember to implement proper error handling and consider scenarios where the connection might be lost before you receive the conversation_completed message.
Updated about 2 months ago