Redacting Sensitive Information From Conversations Using Symbl.ai

Objective: This tutorial will provide step-by-step guidance for developers on using Symbl.ai for redacting sensitive information by detecting specific entities while allowing for the exclusion of certain entities from redaction.

Outcome: After completing this tutorial, you'll be able to process customer call recordings, detect entities and custom entities, and apply selective redaction to maintain compliance with data protection standards.

Prerequisites

  • Sign-up for Symbl.ai platform. This will get you access to app_Id and app_Secret to make API calls to Symbl.ai APIs
  • Access to a recording or URL for audio/video conversation or a transcript to process

Step 1: Authentication: Obtain an access token for Symbl.ai API access.

Use your app_Id and app_Secret to generate an Access Token. Developers can also generate access token and process different types of conversations, using Symbl.ai Postman collection. Refer these docs to get started using Postman.

import requests

app_id = "\<APP_ID>"        # Replace with your App ID  
app_secret = "\<APP_SECRET>" # Replace with your App Secret

url = "<https://api.symbl.ai/oauth2/token:generate">  
headers = {"Content-Type": "application/json"}  
data = {"type": "application", "appId": app_id, "appSecret": app_secret}

response = requests.post(url, headers=headers, json=data)  
access_token = response.json()["accessToken"]

Step 2: Processing an Audio File:

Use Symbl.ai's Async API to transcribe meeting audio or video into text. Select the appropriate API endpoint (audio, video, or text) based on your meeting format. In this tutorial we will use the Async Audio API to process the conversation. If you need to process conversations of a different type, update the endpoint as mentioned in our docs.

To redact entities from a conversation, the entities need to be detected. Use Symbl.ai's entity detection feature to identify specific entities in the conversation​​. Entities are detected by default when you process a conversation via any Async API. Here is the list of entities detected across PII, PCI, PHI and General class of data. You can also create custom entities specific for your use case.

# Step 2: Process the Meeting
url = "<https://api.symbl.ai/v1/process/audio">  
headers = {"Authorization": f"Bearer {access_token}", "Content-Type": "audio/mpeg"}

file_path = "\<PATH_TO_AUDIO_FILE>"  
with open(file_path, 'rb') as file:  
  response = requests.post(url, headers=headers, data=file)

conversation_id = response.json()["conversationId"]  
print("Conversation ID:", conversation_id)

Step 3: Get Detected Entities:

Once the conversation is processed, you can identify the detected entities using the following GET Detected Entities conversation API.

# Step 3: Detecting Entities in the Conversation

url = f"<https://api.symbl.ai/v1/conversations/{conversation_id}/entities">  
headers = {"Authorization": f"Bearer {access_token}"}

response = requests.get(url, headers=headers)  
print("Detected Entities:", response.json())

Step 4: Setting Redaction Preferences:Update user settings for redaction using the Management API. This includes specifying which entities to redact or exclude​​.

Explanation of the following code parameters:

  • 'op': 'add' -> The operation is to add new settings. If you already added redaction settings, use ‘replace’ to update the settings.
  • 'default': true -> The default redaction shows the redacted entities as [Entity_Name]. You can also choose ‘obfuscation’ or ‘custom’ based on your preference.
  • 'exclude': ['PERSON_NAME', 'EMAIL_ADDRESS'] -> The ‘exclude’ keyword excludes the entities to be not redacted from the conversation.
  • 'type': 'custom' -> This facilitates the customization of specific entities with custom redaction. For example, in the code the Product_Name is set as ‘ABCXYZ’ which means if an entity of Product_Name is detected in the conversation, the transcript replaces the entity with ‘ABCXYZ’
#Step 4: Setting Redaction Preferences
import fetch from 'node-fetch';

const settings = \[  
  {  
    'op': 'add',  
    'path': '/redaction',  
    'value': {  
      'default': true,  
      'type': 'custom',  
      'exclude': ['PERSON_NAME', 'EMAIL_ADDRESS'],  # Add or remove entities as needed  
      'custom': {  
        'Product_Name': 'ABCXYZ',  
        'IR_Code': '82498-34792',  
        'default': 'custom'  
      }  
    }  
  }  
]

const fetchResponse = await fetch(`https://api.symbl.ai/v1/manage/settings`, {  
  method: patch,  
  body: JSON.stringify(settings),  
  headers: {  
    'Authorization': `Bearer ${accessToken}`,  
    'Content-Type': 'application/json'  
  }  
});

const responseBody = await fetchResponse.json();  
console.log(responseBody);

Step 5: Get Redacted Messages:

Redact sensitive information from the transcript using the Conversation API​​.

#Step 5:  Redacting Sensitive Information

url = f"<https://api.symbl.ai/v1/conversations/{conversation_id}/messages?redact=true">  
headers = {"Authorization": f"Bearer {access_token}", "Content-Type": "application/json"}

response = requests.get(url, headers=headers)  
print("Redacted Messages:", response.json())

In this step, the “redact=true” query parameter ensures that sensitive information is redacted from the transcript if the redaction is set to false by default in the redaction settings added in Step 4. Replace \<CONVERSATION_ID> and \<ACCESS_TOKEN> with your actual conversation ID and access token, respectively.

Redaction settings set in Step 4 will be applied to all conversation APIs. If the redaction is set to false by default and if you need to redact entities for a conversation APIs, you can add the parameter redact=true to redact insights and vice versa.

If you have any questions about this tutorial, contact [email protected]