Redaction

Use the Symbl.ai Conversations API to identify and redact sensitive information from transcripts. Redacting sensitive data is the process of concealing confidential information using the Conversations API Get messages operation.

For a complete list of entities that can be identified for redaction, see Supported entities.

Authentication

This request requires an access token, as described in Authenticate.

Redaction use cases

You can use redaction in many ways:

  • Maintain compliance and protect consumer data: Anonymize protected information and store redacted versions of transcripts containing sensitive information to achieve and maintain compliance with HIPAA, CCPA, GDPR, and PCI SSC.
  • Automate workflows: Automatically remove privileged or confidential information from documents and route the information to secure areas without unnecessary human exposure.
  • Employee training: Enable re-use of real world samples with protected data redacted.

Types of redacted output

The general types of information covered by redaction include:

  • Personal Identifiable Information (PII) entities: Names, ages, birthdays, social security numbers, and driver’s license numbers.
  • Payment Card Industry (PCI) entities: Bank accounts, routing numbers, credit card numbers, expiration dates, and credit card verification values (CVVs).
  • Protected Health Information (PHI) entities: Health conditions, blood groups, injuries, and medical statistics.
  • General entities: Events, file names, times, and URLs.

Redaction helps you reach standards compliance, such as:

  • Payment Card Industry Security Standards Council (PCI SSC).
  • Health Insurance Portability and Accountability Act (HIPAA).
  • General Data Protection Regulation (GDPR).
  • California Consumer Privacy Act (CCPA).

For a complete list of entities that can be identified for redaction, see Supported entities.

Get redacted transcripts

This section describes how to get a redacted transcript from a conversation. This request requires a conversation ID. You receive a conversation ID when you process a conversation with the Symbl.ai APIs.

After processing a conversation, you need to generate a redacted transcript using the Conversations API Get messages operation with the redact=true parameter.

To get redacted content, use the following operation:

GET https://api.symbl.ai/v1/conversations/{conversationId}/messages?redact=true

The following code samples provide basic examples of enabling redaction.

curl --request GET \
     --url 'https://api.symbl.ai/v1/conversations/<CONVERSATION_ID>/messages?redact=true' \
     --header 'accept: application/json' \
     --header 'authorization: Bearer <ACCESS_TOKEN>'
import fetch from 'node-fetch';

const accessToken = '<ACCESS_TOKEN>';
const conversationId = '<CONVERSATION_ID>';

const fetchResponse = await fetch(`https://api.symbl.ai/v1/conversations/${conversationId}/messages?redact=true`, {
  method: 'get',
  headers: {
    'Authorization': `Bearer ${accessToken}`,
    'Content-Type': 'application/json'
  }
});

const responseBody = await fetchResponse.json();

console.log(JSON.stringify(responseBody, null, 2));
import requests

url = "https://api.symbl.ai/v1/conversations/<CONVERSATION_ID>/messages?redact=true"

headers = {
    "accept": "application/json",
    "authorization": "Bearer <ACCESS_TOKEN>"
}

response = requests.get(url, headers=headers)

print(response.text)

Where:

For more reference information, see Get messages.

Query parameters

For redaction, the Get messages operation supports the following query parameters.

ParameterTypeDescription
redactBooleanThe redact=true parameter redacts sensitive information from the transcript.
excludeStringUse the exclude parameter to specify which entities are excluded from the operation.

If not included, the default value is null.

Example response

{
    "messages": [
        {
            "id": "5205809886134272",
            "text": "[ORGANIZATION] is a store that sells mattresses if you are sleepy by chance.",
            "from": {
                "id": "ac89a8f8-7246-411e-a4ce-d28383ca3368",
                "name": "[REDACTED NAME 1]",
                "email": "[REDACTED EMAIL 1]"
            },
            "startTime": "2022-09-01T17:10:24.097Z",
            "endTime": "2022-09-01T17:10:25.497Z",
            "timeOffset": 0,
            "duration": 1.4,
            "conversationId": "5083268647485440",
            "phrases": [],
        },
        {
            "id": "6014815093391360",
            "text": "This is [PERSON_NAME] who will be joining our team next week as our new [OCCUPATION].",
            "from": {
                "id": "ac89a8f8-7246-411e-a4ce-d28383ca3368",
                "name": "[REDACTED NAME 2]",
                "email": "[REDACTED EMAIL 2]"
            },
            "startTime": "2022-09-01T17:10:25.497Z",
            "endTime": "2022-09-01T17:10:26.397Z",
            "timeOffset": 1.4,
            "duration": 0.9,
            "conversationId": "5083268647485440",
            "phrases": []
        },
        {
            "id": "5191738432421888",
            "text": "Yes.",
            "from": {
                "id": "f116f029-d35f-48bb-bf93-ccc2b62d2e91",
                "name": "[REDACTED NAME 1]",
                "email": "[REDACTED EMAIL 1]"
            },
            "startTime": "2022-09-01T17:10:27.297Z",
            "endTime": "2022-09-01T17:10:27.697Z",
            "timeOffset": 3.2,
            "duration": 0.4,
            "conversationId": "5083268647485440",
            "phrases": []
        }
  ]
}

Response fields

For more information about the response fields for this request, see Speech-to-text.

Redacted output options

You can change the characters to display in place of redacted output. Use the Management API User settings to customize output options, briefly described here. For details, see Settings for redaction.

You can choose one of three redaction options for the redacted output:

  • Default: replace the redacted entity with the name of the entity type, such as [SSN]. For the complete list of entity type names, see Supported entities.

  • Obfuscation: replace the redacted entity with four asterisks [****], regardless of entity length.

  • Custom: replace the redacted entity with any custom string you define, such as [123-34-6789] for social security number.

The following table shows the same sentence redacted in each of the three different formats:

FormatExample redaction
OriginalMy social security number is 444-67-3334 and I live in New York City.
DefaultMy social security number is [SSN] and I live in [LOCATION_CITY].
ObfuscationMy social security number is [****] and I live in [****].
CustomMy social security number is [123-45-6789] and I live in [City].

Custom redaction

By default, all supported entities are redacted from the transcript with redact=true. You can customize which entities not to redact from the list of Supported entities. Using the exclude=[] parameter, you can redact all entities except those you specify.

In this example, a customer working in market research wants to retain all the cities from a dataset while redacting other sensitive content. The example shows how to keep the actual city name, but redact all other sensitive content.

Note: Custom redaction is best for excluding a limited number of entities because each excluded entity type must be manually included in the request URL. Each entity to be excluded must be in ALL_CAPS with an underscore between words.

Use the exclude parameter to achieve custom redaction:

To redact all entities except LOCATION_CITY, use exclude=["LOCATION_CITY"] AND redact=true.

Example request:

GET https://api.symbl.ai/v1/conversations/{conversationId}/messages?exclude=["LOCATION_CITY"]&redact=true

Types of custom redaction:

  • Original message: My social security number is 444-67-3334 and I live in New York City.

  • Default redaction with redact=true: My social security number is [SSN] and I live in [LOCATION_CITY].

  • With exclude=["LOCATION_CITY"]&redact=true, the redacted message: My social security number is [SSN] and I live in New York City.