Live speech to text and AI insights on local server
In this guide you will be shown how to use Symbl's Javascript SDK to enable your device's microphone for recording audio and processing. This example was built to run on Mac or Windows PCs. You will learn how to use Symbl's API for speech-to-text transcription and real-time AI insights, such as follow-ups, action items, topics and questions.
Throughout the guide you'll find various references to these variable names, which you will have to replace with your values:
Key | Description |
---|---|
appId | The application ID you get from the home page of the platform. |
appSecret | The application secret you get from the home page of the platform. |
emailAddress | The email address you wish to send the summary email to. The summary email summarizes the conversation and any conversational insights gained from it. |
View the full example on GitHub
#
ContentsIn this guide you will learn the following:
- Getting Started
- Initialize SDK
- Real-time Request Configuration Options
- Handle the audio stream
- Process speech using device's microphone
- Test
- Grabbing the Conversation ID
- Full Code Sample
#
Getting startedTo get this example running, you need to install the node packages symbl-node
, uuid
and mic
. You can do that via with npm install symbl-node
, npm install uuid
and npm install mic
. We're using mic
to simply get audio from the microphone and pass it on to the WebSocket connection.
mic
also requires you to install sox
. To install sox
choose the option which fits your operating system:
Mac: brew install sox
Windows and Linux: Installation of SoX on different Platforms
Simple setup for mic
. You can view the full configuration options for mic
here
#
Initialize SDKYou will also need a unique ID to associate with our Symbl request. You will create
this ID using uuid
package
#
Real-time Request Configuration OptionsNow you can start the connection using sdk.startRealtimeRequest
. You will need to create a configuration object for the connection
Here is the breakdown of the configuration types:
insightTypes
)#
Insight Types (insightTypes
- This array represents the type of insights that are to be detected. Today the supported types areaction_item
andquestion
.
action_item
)#
Action Item (An action item is a specific outcome recognized in the conversation that requires one or more people in the conversation to act in the future. Action items will be returned via the onInsightResponse
callback.
These actions can be definitive and owned with a commitment to working on a presentation, sharing a file, completing a task, etc. Or they can be non-definitive like an idea, suggestion or an opinion that could be worked upon.
All action items are generated with action phrases, assignees and due dates so that you can build workflow automation with your tools.
#
Action Item JSON Response ExampleThis is an example of an action_item
returned via the onInsightResponse
callback function.
question
)#
Question (The API will find explicit questions or request for information that comes up during the conversation. Questions will be returned via the onInsightResponse
callback.
#
Question JSON Response ExampleThis is an example of a question
returned via the onInsightResponse
callback function.
config
)#
Config (config
: This configuration object encapsulates the properties which directly relate to the conversation generated by the audio being passed.meetingTitle
: This optional parameter specifies the name of the conversation generated. You can get more info on conversations hereconfidenceThreshold
: This optional parameter specifies the confidence threshold for detecting the insights. Only the insights that haveconfidenceScore
more than this value will be returned.timezoneOffset
: This specifies the actual timezoneOffset used for detecting the time/date-related entities.languageCode
: It specifies the language to be used for transcribing the audio in BCP-47 format. (Needs to be same as the language in which audio is spoken)sampleRateHertz
: It specifies the sampleRate for this audio stream.
speaker
)#
Speaker (speaker
: Optionally specify the details of the speaker whose data is being passed in the stream. This enables an e-mail with the Summary UI URL to be sent after the end of the stream.
handlers
)#
Handlers (handlers
: This object has the callback functions for different eventsonSpeechDetected
: To retrieve the real-time transcription results as soon as they are detected. You can use this callback to render live transcription which is specific to the speaker of this audio stream.#
onSpeechDetected JSON Response ExampleonMessageResponse
: This callback function contains the "finalized" transcription data for this speaker and if used with multiple streams with other speakers this callback would also provide their messages. The "finalized" messages mean that the automatic speech recognition has finalized the state of this part of transcription and has declared it "final". Therefore, this transcription will be more accurate thanonSpeechDetected
.#
onMessageResponse JSON Response ExampleonInsightResponse
: This callback provides you with any of the detected insights in real-time as they are detected. As with theonMessageCallback
this would also return every speaker's insights in case of multiple streams.View the examples for
onInsightResponse
here.onTopicResponse
: This callback provides you with any of the detected topics in real-time as they are detected. As with theonMessageCallback
this would also return every topic in case of multiple streams.#
onTopicResponse JSON Response Example
#
Full Configuration Object#
Handle the audio streamThe connection should now be established to the Web Socket. Now you must create several handlers which will handle the audio stream. You can view all the valid handlers here:
#
Process speech using the device's microphoneNow you start the recording:
Your microphone should now be open to input which will be sent to the Web Socket for processing. The microphone will continue to accept input until the application is stopped or until you tell the connection to stop:
#
TestTo verify and check if the code is working:
Run your code:
#
Grabbing the Conversation IDThe Conversation ID is very useful for our other APIs such as the Conversation API. We don't use it in this example because it's mainly used for non-real-time data gathering, but it's good to know how to grab it as you can use the Conversation ID later to extract the conversation insights again.
With the Conversation ID you can do each of the following (and more!):
View conversation topics
Summary topics provide a quick overview of the key things that were talked about in the conversation.
View action items
An action item is a specific outcome recognized in the conversation that requires one or more people in the conversation to take a specific action, e.g. set up a meeting, share a file, complete a task, etc.
View follow-ups
This is a category of action items with a connotation to follow-up a request or a task like sending an email or making a phone call or booking an appointment or setting up a meeting.
#
Full Code SampleHere's the full sample below which you can also view on Github: