Real Time API

The realtime API uses the WebSocket protocol to enable interactive two-way communication between the agent/customer and Symbl.ai servers. By leveraging WebSockets, there's no need to poll the server for updates — events are streamed directly to the client as the conversation is processed in real time.

Authentication

Start by generating an access token using your appId and appSecret from the Symbl.ai platform.

curl --location 'https://api.symbl.ai/oauth2/token:generate' \
--header 'Content-Type: application/json' \
--data '
{
    "type": "application",
    "appId": "your appId",
    "appSecret": "your appSecret"
}'

This request will return an access token, which is required for all subsequent API calls.

Connection

Real-time Assist API URL

Use the following URL when you open a WebSocket connection to the Real-time Assist API:

wss://api.symbl.ai/v1/realtime/assist/{REALTIME_SESSION_ID}?access_token={ACCESS_TOKEN}

Where:

  • ACCESS_TOKEN is an access token that you generate with your app ID and secret.
  • REALTIME_SESSION_ID is a unique identifier for the session, which you should generate (e.g., using UUID).

Message Reference

The WebSocket protocol supports sending text messages in JSON format and binary messages for communicating various data. The Real-time Assist API utilizes text messages to deliver conversation intelligence and transcription, and binary messages to receive and process audio.

Start Request

Use the start_request message to initiate a real-time assist session. The message includes configuration information for the conversation and speaker.

{  
    "type": "start_request",  
    "id": string,  
    "RTAId": string,  
    "config": {  
        "speechRecognition": {  
            "sampleRateHertz": 44100  
        }  
    },  
    "speaker": {  
        "email": string,  
        "userId": string,  
        "name": string,  
        "role": string  
    },  
    "assistants": ["objection-handling"]  
}

Field descriptions

  • type: Required string. Must be "start_request".
  • id: Required string. The unique session ID you generated.
  • RTAId: Required string. The ID of the Real-time Assist configuration you want to use.
  • config: Optional object. Contains configuration settings.
    • speechRecognition: Optional object. Contains audio configuration.
      • sampleRateHertz: Optional number. The sample rate of the audio stream.
  • speaker: Required object. Contains information about the speaker.
    • email: Required string. The speaker's email address.
    • userId: Required string. A unique identifier for the speaker.
    • name: Required string. The speaker's name.
    • role: Required string. The speaker's role (e.g., "agent" or "customer").
  • assistants: Required array. Contains the types of assistants to enable (e.g., ["objection-handling"]).

Responses

The Real-time Assist API sends various responses during the session. Each response includes a type field and additional data based on the type. Here are some of the response types:

  • message: Contains various subtypes of messages.
  • transcript_response: Contains the transcript of the conversation.
  • objection_response: Contains information about detected objections.

Message Response

{
    "type": "message",
    "message": {
        "type": string,
        "isFinal": boolean,
        "payload": {
            "raw": {
                "alternatives": [
                    {
                        "words": [
                            {
                                "word": string,
                                "startTime": {
                                    "seconds": string,
                                    "nanos": string
                                },
                                "endTime": {
                                    "seconds": string,
                                    "nanos": string
                                }
                            }
                        ],
                        "transcript": string,
                        "confidence": number
                    }
                ]
            }
        },
        "punctuated": {
            "transcript": string
        },
        "user": {
            "userId": string,
            "name": string,
            "email": string,
            "role": string
        }
    },
    "timeOffset": number
}

Field descriptions

  • type: Always "message" for this type of response.
  • message: An object containing the details of the message.
    • type: The specific type of message. Common values include:
      • recognition_started: Indicates that the system has started recognizing speech.
      • recognition_result: Contains interim or final results of speech recognition.
      • recognition_stopped: Indicates that speech recognition has stopped.
    • isFinal: A boolean indicating whether this is a final result (true) or an interim result (false).
    • payload: Contains the raw recognition data.
      • raw: The unprocessed recognition data.
        • alternatives: An array of possible interpretations of the speech.
          • words: An array of recognized words with their timing information.
          • transcript: The full text of this recognition alternative.
          • confidence: A number between 0 and 1 indicating the confidence in this alternative.
    • punctuated: Contains a punctuated version of the transcript.
    • user: Information about the user who is speaking.
      • userId: A unique identifier for the user.
      • name: The name of the user.
      • email: The email address of the user.
      • role: The role of the user (e.g., "agent" or "customer").
  • timeOffset: The time offset of this message in milliseconds from the start of the session.

This structure allows for real-time updates on the recognized speech, including interim results that can be refined as more context becomes available. The isFinal flag is particularly useful for determining when a segment of speech has been fully processed and can be considered complete.

Transcript Response

{
    "type": "transcript_response",
    "payload": {
        "transcript": string,
        "user": {
            "email": string,
            "userId": string,
            "name": string,
            "role": string
        }
    }
}

Objection Response

{
    "type": "objection_response",
    "payload": {
        "name": string,
        "message": string
    }
}
  • type: Always "objection_response" for this type of message.
  • payload: An object containing details about the detected objection.`
    • name: A string identifying the type or category of the objection detected. This could be something like "price_objection", "value_objection", "timing_objection", etc.
    • message: A string containing the exact words spoken by the customer that triggered the objection detection. This helps in understanding the context of the objection.

Sending Audio

After receiving a recognition_started message, you can start sending audio data as binary messages. Each chunk of audio should be no larger than 8,192 bytes.

Sample request

Here's a basic example of how to use the Real-time Assist API with Node.js:

const WebSocketClient = require('websocket').client;  
const uuid = require('uuid').v4;  
const mic = require('mic');

// Set up microphone  
const micInstance = mic({  
    rate: '44100',  
    channels: '1',  
    debug: false,  
    exitOnSilence: 30,  
});  
const micInputStream = micInstance.getAudioStream();

// Set up user and session  
const user = {  
    email: "[[email protected]](mailto:[email protected])",  
    userId: "[[email protected]](mailto:[email protected])",  
    name: "Example User",  
    role: "agent"  
};  
const realtimeSessionId = uuid();

// Connect to WebSocket  
const client = new WebSocketClient();  
client.connect(  
    `ws://104.198.213.217/v1/realtime/assist/${realtimeSessionId}?access_token=${YOUR_ACCESS_TOKEN}`,  
    'ws',  
    null,  
    null,  
    { 'x-api-key': YOUR_ACCESS_TOKEN, rejectUnauthorized: false }  
);

client.on('connect', function(connection) {  
    console.log('WebSocket Connected');
  
// Start microphone
micInstance.start();

// Send start request
connection.send(JSON.stringify({
    type: 'start_request',
    id: realtimeSessionId,
    RTAId: "YOUR_RTA_ID",
    config: {
        speechRecognition: {
            sampleRateHertz: 44100,
        },
    },
    speaker: user,
    assistants: ["objection-handling"],
}));

// Handle messages
connection.on('message', function(message) {
    if (message.type === 'utf8') {
        const msg = JSON.parse(message.utf8Data);
        console.log(msg);
        
        if (msg.message && msg.message.type === 'recognition_started') {
            // Start sending audio data
            micInputStream.on('data', (data) => {
                connection.send(data);
            });
        }
    }
});

Requests

In addition to the start_request, you can also send a stop_request to end the Real-time Assist session.

Stop Request

Use the stop_request message to stop audio recognition and end a real-time conversation.

{  
    "type": "stop_request"  
}

Field description

  • type: Required string. The value must be "stop_request".

When you send a stop_request, the system will stop processing audio input and finalize any pending analyses. After sending this request, you should expect to receive a final set of responses, which may include:

  • A recognition_stopped message indicating that speech recognition has ended.
  • Any final transcript_response messages for speech that was being processed.
  • Any final objection_response messages for objections that were detected but not yet reported.
  • A conversation_completed message indicating that the entire conversation has been processed and the session is closed

After receiving the conversation_completed message, you should close the WebSocket connection.

Sample request

// Assuming 'connection' is your WebSocket connection object  
connection.send(JSON.stringify({  
    type: "stop_request"  
}));

// Handle the responses  
connection.on('message', function(message) {  
    if (message.type === 'utf8') {  
        const msg = JSON.parse(message.utf8Data);  
        if (msg.type === 'message' && msg.message.type === 'conversation_completed') {  
            console.log('Conversation completed, closing connection');  
            connection.close();  
        }  
    }  
});

Remember to implement proper error handling and consider scenarios where the connection might be lost before you receive the conversation_completed message.