If you plan to use the multiple audio streams we recommend using single streams for each speaker involved to get the most accurate of transcription and speaker separation.
You can also consume the processed results in real-time, which include:
- Real Time Transcription
- Real Time Insights (Action Items and Questions)
- When using multiple audio streams (Each stream for 1 speaker) you also get access to speaker-separated data (including transcription and messages)
The example below utilises
mic package to stream audio in real-time. This will be a single stream of audio obtained through
mic which may have one or more than one speaker's audio.
The link to the complete example below can be found here
In the above snippet we import the
mic npm packages. The
uuid package is used for generating a unique ID to represent this stream and it's strongly recommended to use it.
mic package is used to obtain the audio stream in real-time to pass to the SDK.
We now declare the
sampleRateHertz variable to specify the Sample Rate of the audio obtained from the
It is imperative to use the same Sample Rate used for initialising the
mic package and for passing in to the
Otherwise the transcription will be completely in-accurate.
We also initialise
channels: '1' (mono channel) audio as currently only mono channel audio data is supported.
Next we initialise a helper function to execute our code in the
async/await style. The following code snippets (including the one just above) will be a part of the same function.
init call, passing in
appSecret which you can be obtain by signing up on Symbl Developer Platform
We also initialise variable
uuid function for the unique ID required for this stream as was also mentioned above in the import section snippet.
The next call is made to
Lets breakdown the configuration and take a look at them one by one.
id: The unique ID that represents this stream. (This needs to be unique, which is why we are using
insightTypes: This array represents the type of insights that are to be detected. Today the supported ones are
config: This configuration object encapsulates the properties which directly relate to the conversation generated by the audio being passed.
meetingTitle: This optional parameter specifies the name of the conversation generated. You can get more info on conversations here
confidenceThreshold: This optional parameter specifies the confidence threshold for detecting the insights. Only the insights that have
confidenceScoremore than this value will be returned.
timezoneOffset: This specifies the actual timezoneOffset used for detecting the time/date related entities.
languageCode: It specifies the language to be used for transcribing the audio in BCP-47 format. (Needs to be same as the language in which audio is spoken)
sampleRateHertz: It specifies the sampleRate for this audio stream.
speaker: Optionally specify the details of the speaker whose data is being passed in the stream. This enables an e-mail with the Summary UI URL to be sent after the end of the stream.
handlers: This object has the callback functions for different events a.
onSpeechDetected: To retrieve the real-time transcription results as soon as they are detected. We can use this callback to render live transcription which is specific to the speaker of this audio stream.
onMessageResponse: This callback function contains the "finalized" transcription data for this speaker and if used with multiple streams with other speakers this callback would also provide their messages. The "finalized" messages mean that the ASR has finalised the state of this part of transcription and has declared it "final".
onInsightResponse: This callback would provide with any of the detected insights in real-time as they are detected. As with the
onMessageCallbackabove this would also return every speaker's insights in case of multiple streams.
startRealtimeRequest returns successfully, it signifies that the connection has been established successfully with the passed configuration.
In the above snippet we now obtain the audio data from the
For the purpose of demoing a continuous audio stream we now simulate a
stop on the above stream after 60 seconds.
connection.stop() would close the active connection and will trigger the optional email if the
speaker config is included.
conversationData variable includes the
conversationId you can use with the Conversation API to retrieve this conversation's data.
The same example explained above can be deployed on multiple machines, each with one speaker to simulate the multiple streams use-case.
The only thing common needs to be the unique ID created in the above example which is used to initialize
Having this unique ID in common across all different ensures that the audio streams of all the speakers are bound the context of a single conversation.
This conversation can be retrieved by the
conversationId via the Conversation API which will include the data of all the speakers connecting using the same common ID.