Live speech-to-text and AI insights in browser
This guide describes how to get started with the Symbl.ai native Streaming API, which is our most accurate API for conversation analysis. The Streaming API enables real-time conversational analysis for voice, video, chat, or any live streaming directly through your web browser. If you have voice, video, or chat enabled, the streaming enables you to tap the raw conversational data of those streams.
You can view the complete code sample for this tutorial on GitHub.
Getting started
Create the endpoint for the WebSocket to connect to, and request access to the microphone. The WebSocket endpoint has two parts:
- A unique connection ID. This example uses a universally unique identifier (UUID) to automatically generate a unique connection ID. This ensures that your
connectionId
does not conflict with any other client connecting to the same namespace. - A GET parameter named
access_token
. This is the access token generated during our Authentication process.
Check the example below:
const accessToken = accessToken;
// Refer to the Authentication section for how to generate the accessToken: https://docs.symbl.ai/docs/authenticate
const uuid = require('uuid').v4;
const connectionId = uuid();
const symblEndpoint = `wss://api.symbl.ai/v1/streaming/${connectionId}?access_token=${accessToken}`;
const ws = new WebSocket(symblEndpoint);
// Have audio context instance created for getting sample rate and audio processing handler.
const context = new AudioContext();
Backward Compatibility
The previous endpoint
wss://api.symbl.ai/v1/realtime/insights/
is now updated towss://api.symbl.ai/v1/streaming/
to standardize our API nomenclature. This change is backward compatible. However, we recommend using the new endpoint.
Create the WebSocket
Now that you have constructed the endpoint, let's create a new WebSocket!
You can use JavaScript's API for WebSockets. For more info on JavaScript's API for WebSockets, see: https://developer.mozilla.org/en-US/docs/Web/API/WebSocket
const ws = new WebSocket(symblEndpoint);
Before you connect the WebSocket to the endpoint you first want to subscribe to its event listeners so you don’t miss any messages.
Set WebSocket listeners
// Fired when a message is received from the WebSocket server
ws.onmessage = (event) => {
// You can find the conversationId in event.message.data.conversationId;
const data = JSON.parse(event.data);
if (data.type === 'message' && data.message.hasOwnProperty('data')) {
console.log('conversationId', data.message.data.conversationId);
}
if (data.type === 'message_response') {
for (let message of data.messages) {
console.log('Transcript (more accurate): ', message.payload.content);
}
}
if (data.type === 'topic_response') {
for (let topic of data.topics) {
console.log('Topic detected: ', topic.phrases)
}
}
if (data.type === 'insight_response') {
for (let insight of data.insights) {
console.log('Insight detected: ', insight.payload.content);
}
}
if (data.type === 'message' && data.message.hasOwnProperty('punctuated')) {
console.log('Live transcript (less accurate): ', data.message.punctuated.transcript)
}
console.log(`Response type: ${data.type}. Object: `, data);
};
// Fired when the WebSocket closes unexpectedly due to an error or lost connetion
ws.onerror = (err) => {
console.error(err);
};
// Fired when the WebSocket connection has been closed
ws.onclose = (event) => {
console.info('Connection to websocket closed');
};
Start the WebSocket connection
Once the connection has been opened you want to send this message to the WebSocket to start the connection to the Streaming API.
// Fired when the connection succeeds.
ws.onopen = (event) => {
ws.send(JSON.stringify({
type: 'start_request',
meetingTitle: 'Websockets How-to', // Conversation name
insightTypes: ['question', 'action_item'], // Will enable insight generation
config: {
confidenceThreshold: 0.5,
languageCode: 'en-US',
speechRecognition: {
encoding: 'LINEAR16',
sampleRateHertz: context.sampleRate, // Get sample rate from browser's audio context
}
},
speaker: {
userId: '[email protected]',
name: 'Tony Stark',
}
}));
};
Check out our guide on the Best Practices for Audio Integrations with Symbl to learn more about our audio encoding options.
Create the Audio Stream
Once connected to the Streaming API, the next step is to create an audio stream.
You can do this with the Navigator
API by accessing mediaDevices
and calling getUserMedia
.
This enables you to grant the browser access to your computer's microphone.
const stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: false });
Since you are processing audio data in this tutorial, you don’t need to request video device access.
Handle the audio stream
You should already have access granted to the microphone. Now you can use the
WebSocket to handle the data stream so transcripts and insights can be analyzed in real-time.
You can create a new AudioContext
and use the microphones stream you retrieved
from the Promise resolution above to create a new source and processor.
/**
* The callback function which fires after a user gives the browser permission to use
* the computer's microphone. Starts a recording session which sends the audio stream to
* the WebSocket endpoint for processing.
*/
const handleSuccess = (stream) => {
const source = context.createMediaStreamSource(stream);
const processor = context.createScriptProcessor(1024, 1, 1);
const gainNode = context.createGain();
source.connect(gainNode);
gainNode.connect(processor);
processor.connect(context.destination);
processor.onaudioprocess = (e) => {
// convert to 16-bit payload
const inputData = e.inputBuffer.getChannelData(0) || new Float32Array(this.bufferSize);
const targetBuffer = new Int16Array(inputData.length);
for (let index = inputData.length; index > 0; index--) {
targetBuffer[index] = 32767 * Math.min(1, inputData[index]);
}
// Send audio stream to websocket.
if (ws.readyState === WebSocket.OPEN) {
ws.send(targetBuffer.buffer);
}
};
};
handleSuccess(stream);
Stopping the WebSocket Connection
To stop the WebSocket connection once you're done, run this code in your web browser:
// Stops the WebSocket connection.
ws.send(JSON.stringify({
"type": "stop_request"
}));
Full Code Sample
Here's the complete code sample below which you can also view on GitHub:
const accessToken = accessToken;
// Refer to the Authentication section for how to generate the accessToken: https://docs.symbl.ai/docs/authenticate
const uuid = require('uuid').v4;
const connectionId = uuid();
const symblEndpoint = `wss://api.symbl.ai/v1/streaming/${connectionId}?access_token=${accessToken}`;
const ws = new WebSocket(symblEndpoint);
// Fired when a message is received from the WebSocket server
ws.onmessage = (event) => {
// You can find the conversationId in event.message.data.conversationId;
const data = JSON.parse(event.data);
if (data.type === 'message' && data.message.hasOwnProperty('data')) {
console.log('conversationId', data.message.data.conversationId);
}
if (data.type === 'message_response') {
for (let message of data.messages) {
console.log('Transcript (more accurate): ', message.payload.content);
}
}
if (data.type === 'topic_response') {
for (let topic of data.topics) {
console.log('Topic detected: ', topic.phrases)
}
}
if (data.type === 'insight_response') {
for (let insight of data.insights) {
console.log('Insight detected: ', insight.payload.content);
}
}
if (data.type === 'message' && data.message.hasOwnProperty('punctuated')) {
console.log('Live transcript (less accurate): ', data.message.punctuated.transcript)
}
console.log(`Response type: ${data.type}. Object: `, data);
};
// Fired when the WebSocket closes unexpectedly due to an error or lost connetion
ws.onerror = (err) => {
console.error(err);
};
// Fired when the WebSocket connection has been closed
ws.onclose = (event) => {
console.info('Connection to websocket closed');
};
// Fired when the connection succeeds.
ws.onopen = (event) => {
ws.send(JSON.stringify({
type: 'start_request',
meetingTitle: 'Websockets How-to', // Conversation name
insightTypes: ['question', 'action_item'], // Will enable insight generation
config: {
confidenceThreshold: 0.5,
languageCode: 'en-US',
speechRecognition: {
encoding: 'LINEAR16',
sampleRateHertz: context.sampleRate, // Get sample rate from browser's audio context
}
},
speaker: {
userId: '[email protected]',
name: 'Tony Stark',
}
}));
};
const stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: false });
/**
* The callback function which fires after a user gives the browser permission to use
* the computer's microphone. Starts a recording session which sends the audio stream to
* the WebSocket endpoint for processing.
*/
const handleSuccess = (stream) => {
const AudioContext = window.AudioContext;
const context = new AudioContext();
const source = context.createMediaStreamSource(stream);
const processor = context.createScriptProcessor(1024, 1, 1);
const gainNode = context.createGain();
source.connect(gainNode);
gainNode.connect(processor);
processor.connect(context.destination);
processor.onaudioprocess = (e) => {
// convert to 16-bit payload
const inputData = e.inputBuffer.getChannelData(0) || new Float32Array(this.bufferSize);
const targetBuffer = new Int16Array(inputData.length);
for (let index = inputData.length; index > 0; index--) {
targetBuffer[index] = 32767 * Math.min(1, inputData[index]);
}
// Send audio stream to websocket.
if (ws.readyState === WebSocket.OPEN) {
ws.send(targetBuffer.buffer);
}
};
};
handleSuccess(stream);
Test
To verify and check if the code is working, open your browser's development environment and copy the code directly into the console. You'll see the popup for microphone permissions. If you accept, the application starts recording. Start speaking to see the results being logged to the console.
Grabbing the Conversation ID
The Conversation ID is very useful for our other APIs such as the Conversations API. We don't use it in this example because it's mainly used for async (non-real-time) data gathering, but it's good to know how to get it as you can use the Conversation ID later to extract the conversation insights.
If you look closely at the onmessage
handler you can see how to get the Conversation ID:
// Fired when a message is received from the WebSocket server
ws.onmessage = (event) => {
// You can find the conversationId in event.message.data.conversationId;
const data = JSON.parse(event.data);
if (data.type === 'message' && data.message.hasOwnProperty('data')) {
console.log('conversationId', data.message.data.conversationId);
}
if (data.type === 'message_response') {
for (let message of data.messages) {
console.log('Transcript (more accurate): ', message.payload.content);
}
}
if (data.type === 'topic_response') {
for (let topic of data.topics) {
console.log('Topic detected: ', topic.phrases)
}
}
if (data.type === 'insight_response') {
for (let insight of data.insights) {
console.log('Insight detected: ', insight.payload.content);
}
}
if (data.type === 'message' && data.message.hasOwnProperty('punctuated')) {
console.log('Live transcript (less accurate): ', data.message.punctuated.transcript)
}
console.log(`Response type: ${data.type}. Object: `, data);
};
With the Conversation ID you can do each of the following (and more!):
View conversation topics
Summary topics provide a quick overview of the key things that were talked about in the conversation.
View action items
An action item is a specific outcome recognized in the conversation that requires one or more people in the conversation to take a specific action, e.g. set up a meeting, share a file, complete a task, etc.
View follow-ups
This is a category of action items with a connotation to follow-up a request or a task like sending an email or making a phone call or booking an appointment or setting up a meeting.
Updated about 1 year ago