Live speech-to-text and AI insights in browser

This guide describes how to get started with the Symbl.ai native Streaming API, which is our most accurate API for conversation analysis. The Streaming API enables real-time conversational analysis for voice, video, chat, or any live streaming directly through your web browser. If you have voice, video, or chat enabled, the streaming enables you to tap the raw conversational data of those streams.

You can view the complete code sample for this tutorial on GitHub.

Getting started

Create the endpoint for the WebSocket to connect to, and request access to the microphone. The WebSocket endpoint has two parts:

  1. A unique connection ID. This example uses a universally unique identifier (UUID) to automatically generate a unique connection ID. This ensures that your connectionId does not conflict with any other client connecting to the same namespace.
  2. A GET parameter named access_token. This is the access token generated during our Authentication process.

Check the example below:

const accessToken = accessToken;
// Refer to the Authentication section for how to generate the accessToken: https://docs.symbl.ai/docs/authenticate
const uuid = require('uuid').v4;
const connectionId = uuid();
const symblEndpoint = `wss://api.symbl.ai/v1/streaming/${connectionId}?access_token=${accessToken}`;
const ws = new WebSocket(symblEndpoint);

// Have audio context instance created for getting sample rate and audio processing handler.
const context = new AudioContext();

📘

Backward Compatibility

The previous endpoint wss://api.symbl.ai/v1/realtime/insights/ is now updated to wss://api.symbl.ai/v1/streaming/ to standardize our API nomenclature. This change is backward compatible. However, we recommend using the new endpoint.

Create the WebSocket

Now that you have constructed the endpoint, let's create a new WebSocket!

📘

You can use JavaScript's API for WebSockets. For more info on JavaScript's API for WebSockets, see: https://developer.mozilla.org/en-US/docs/Web/API/WebSocket

const ws = new WebSocket(symblEndpoint);

Before you connect the WebSocket to the endpoint you first want to subscribe to its event listeners so you don’t miss any messages.

Set WebSocket listeners

// Fired when a message is received from the WebSocket server
ws.onmessage = (event) => {
  // You can find the conversationId in event.message.data.conversationId;
  const data = JSON.parse(event.data);
  if (data.type === 'message' && data.message.hasOwnProperty('data')) {
    console.log('conversationId', data.message.data.conversationId);
  }
  if (data.type === 'message_response') {
    for (let message of data.messages) {
      console.log('Transcript (more accurate): ', message.payload.content);
    }
  }
  if (data.type === 'topic_response') {
    for (let topic of data.topics) {
      console.log('Topic detected: ', topic.phrases)
    }
  }
  if (data.type === 'insight_response') {
    for (let insight of data.insights) {
      console.log('Insight detected: ', insight.payload.content);
    }
  }
  if (data.type === 'message' && data.message.hasOwnProperty('punctuated')) {
    console.log('Live transcript (less accurate): ', data.message.punctuated.transcript)
  }
  console.log(`Response type: ${data.type}. Object: `, data);
};

// Fired when the WebSocket closes unexpectedly due to an error or lost connetion
ws.onerror  = (err) => {
  console.error(err);
};

// Fired when the WebSocket connection has been closed
ws.onclose = (event) => {
  console.info('Connection to websocket closed');
};

Start the WebSocket connection

Once the connection has been opened you want to send this message to the WebSocket to start the connection to the Streaming API.

// Fired when the connection succeeds.
ws.onopen = (event) => {
  ws.send(JSON.stringify({
    type: 'start_request',
    meetingTitle: 'Websockets How-to', // Conversation name
    insightTypes: ['question', 'action_item'], // Will enable insight generation
    config: {
      confidenceThreshold: 0.5,
      languageCode: 'en-US',
      speechRecognition: {
        encoding: 'LINEAR16',
        sampleRateHertz: context.sampleRate, // Get sample rate from browser's audio context
      }
    },
    speaker: {
      userId: '[email protected]',
      name: 'Tony Stark',
    }
  }));
};

📘

Check out our guide on the Best Practices for Audio Integrations with Symbl to learn more about our audio encoding options.

Create the Audio Stream

Once connected to the Streaming API, the next step is to create an audio stream.
You can do this with the Navigator API by accessing mediaDevices and calling getUserMedia.
This enables you to grant the browser access to your computer's microphone.

const stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: false });

Since you are processing audio data in this tutorial, you don’t need to request video device access.

Handle the audio stream

You should already have access granted to the microphone. Now you can use the
WebSocket to handle the data stream so transcripts and insights can be analyzed in real-time.
You can create a new AudioContext and use the microphones stream you retrieved
from the Promise resolution above to create a new source and processor.

/**
 * The callback function which fires after a user gives the browser permission to use
 * the computer's microphone. Starts a recording session which sends the audio stream to
 * the WebSocket endpoint for processing.
 */
const handleSuccess = (stream) => {
  const source = context.createMediaStreamSource(stream);
  const processor = context.createScriptProcessor(1024, 1, 1);
  const gainNode = context.createGain();
  source.connect(gainNode);
  gainNode.connect(processor);
  processor.connect(context.destination);
  processor.onaudioprocess = (e) => {
    // convert to 16-bit payload
    const inputData = e.inputBuffer.getChannelData(0) || new Float32Array(this.bufferSize);
    const targetBuffer = new Int16Array(inputData.length);
    for (let index = inputData.length; index > 0; index--) {
        targetBuffer[index] = 32767 * Math.min(1, inputData[index]);
    }
    // Send audio stream to websocket.
    if (ws.readyState === WebSocket.OPEN) {
      ws.send(targetBuffer.buffer);
    }
  };
};


handleSuccess(stream);

Stopping the WebSocket Connection

To stop the WebSocket connection once you're done, run this code in your web browser:

// Stops the WebSocket connection.
ws.send(JSON.stringify({
  "type": "stop_request"
}));

Full Code Sample

Here's the complete code sample below which you can also view on GitHub:

const accessToken = accessToken;
// Refer to the Authentication section for how to generate the accessToken: https://docs.symbl.ai/docs/authenticate
const uuid = require('uuid').v4;
const connectionId = uuid();
const symblEndpoint = `wss://api.symbl.ai/v1/streaming/${connectionId}?access_token=${accessToken}`;
const ws = new WebSocket(symblEndpoint);


// Fired when a message is received from the WebSocket server
ws.onmessage = (event) => {
  // You can find the conversationId in event.message.data.conversationId;
  const data = JSON.parse(event.data);
  if (data.type === 'message' && data.message.hasOwnProperty('data')) {
    console.log('conversationId', data.message.data.conversationId);
  }
  if (data.type === 'message_response') {
    for (let message of data.messages) {
      console.log('Transcript (more accurate): ', message.payload.content);
    }
  }
  if (data.type === 'topic_response') {
    for (let topic of data.topics) {
      console.log('Topic detected: ', topic.phrases)
    }
  }
  if (data.type === 'insight_response') {
    for (let insight of data.insights) {
      console.log('Insight detected: ', insight.payload.content);
    }
  }
  if (data.type === 'message' && data.message.hasOwnProperty('punctuated')) {
    console.log('Live transcript (less accurate): ', data.message.punctuated.transcript)
  }
  console.log(`Response type: ${data.type}. Object: `, data);
};

// Fired when the WebSocket closes unexpectedly due to an error or lost connetion
ws.onerror  = (err) => {
  console.error(err);
};

// Fired when the WebSocket connection has been closed
ws.onclose = (event) => {
  console.info('Connection to websocket closed');
};

// Fired when the connection succeeds.
ws.onopen = (event) => {
  ws.send(JSON.stringify({
    type: 'start_request',
    meetingTitle: 'Websockets How-to', // Conversation name
    insightTypes: ['question', 'action_item'], // Will enable insight generation
    config: {
      confidenceThreshold: 0.5,
      languageCode: 'en-US',
      speechRecognition: {
        encoding: 'LINEAR16',
        sampleRateHertz: context.sampleRate, // Get sample rate from browser's audio context
      }
    },
    speaker: {
      userId: '[email protected]',
      name: 'Tony Stark',
    }
  }));
};

const stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: false });

/**
 * The callback function which fires after a user gives the browser permission to use
 * the computer's microphone. Starts a recording session which sends the audio stream to
 * the WebSocket endpoint for processing.
 */
const handleSuccess = (stream) => {
  const AudioContext = window.AudioContext;
  const context = new AudioContext();
  const source = context.createMediaStreamSource(stream);
  const processor = context.createScriptProcessor(1024, 1, 1);
  const gainNode = context.createGain();
  source.connect(gainNode);
  gainNode.connect(processor);
  processor.connect(context.destination);
  processor.onaudioprocess = (e) => {
    // convert to 16-bit payload
    const inputData = e.inputBuffer.getChannelData(0) || new Float32Array(this.bufferSize);
    const targetBuffer = new Int16Array(inputData.length);
    for (let index = inputData.length; index > 0; index--) {
        targetBuffer[index] = 32767 * Math.min(1, inputData[index]);
    }
    // Send audio stream to websocket.
    if (ws.readyState === WebSocket.OPEN) {
      ws.send(targetBuffer.buffer);
    }
  };
};

handleSuccess(stream);

Test

To verify and check if the code is working, open your browser's development environment and copy the code directly into the console. You'll see the popup for microphone permissions. If you accept, the application starts recording. Start speaking to see the results being logged to the console.

Grabbing the Conversation ID

The Conversation ID is very useful for our other APIs such as the Conversations API. We don't use it in this example because it's mainly used for async (non-real-time) data gathering, but it's good to know how to get it as you can use the Conversation ID later to extract the conversation insights.

If you look closely at the onmessage handler you can see how to get the Conversation ID:

// Fired when a message is received from the WebSocket server
ws.onmessage = (event) => {
  // You can find the conversationId in event.message.data.conversationId;
  const data = JSON.parse(event.data);
  if (data.type === 'message' && data.message.hasOwnProperty('data')) {
    console.log('conversationId', data.message.data.conversationId);
  }
  if (data.type === 'message_response') {
    for (let message of data.messages) {
      console.log('Transcript (more accurate): ', message.payload.content);
    }
  }
  if (data.type === 'topic_response') {
    for (let topic of data.topics) {
      console.log('Topic detected: ', topic.phrases)
    }
  }
  if (data.type === 'insight_response') {
    for (let insight of data.insights) {
      console.log('Insight detected: ', insight.payload.content);
    }
  }
  if (data.type === 'message' && data.message.hasOwnProperty('punctuated')) {
    console.log('Live transcript (less accurate): ', data.message.punctuated.transcript)
  }
  console.log(`Response type: ${data.type}. Object: `, data);
};

With the Conversation ID you can do each of the following (and more!):

View conversation topics

Summary topics provide a quick overview of the key things that were talked about in the conversation.

View action items

An action item is a specific outcome recognized in the conversation that requires one or more people in the conversation to take a specific action, e.g. set up a meeting, share a file, complete a task, etc.

View follow-ups

This is a category of action items with a connotation to follow-up a request or a task like sending an email or making a phone call or booking an appointment or setting up a meeting.