Redaction

Use the Symbl.ai Conversations API to identify and redact sensitive information from transcripts. Redacting sensitive data is the process of concealing confidential information using the Conversations API Get messages operation.

For a complete list of entities that can be identified for redaction, see Supported entities.

Authentication

This request requires an access token, as described in Authenticate.

Redaction use cases

You can use redaction in many ways:

  • Maintain compliance and protect consumer data: Anonymize protected information and store redacted versions of transcripts containing sensitive information to achieve and maintain compliance with HIPAA, CCPA, GDPR, and PCI SSC.
  • Automate workflows: Automatically remove privileged or confidential information from documents and route the information to secure areas without unnecessary human exposure.
  • Employee training: Enable re-use of real world samples with protected data redacted.

Types of redacted output

The general types of information covered by redaction include:

  • Personal Identifiable Information (PII) entities: Names, ages, birthdays, social security numbers, and driver’s license numbers.
  • Payment Card Industry (PCI) entities: Bank accounts, routing numbers, credit card numbers, expiration dates, and credit card verification values (CVVs).
  • Protected Health Information (PHI) entities: Health conditions, blood groups, injuries, and medical statistics.
  • General entities: Events, file names, times, and URLs.

Redaction helps you reach standards compliance, such as:

  • Payment Card Industry Security Standards Council (PCI SSC).
  • Health Insurance Portability and Accountability Act (HIPAA).
  • General Data Protection Regulation (GDPR).
  • California Consumer Privacy Act (CCPA).

For a complete list of entities that can be identified for redaction, see Supported entities.

Get redacted transcripts

This section describes how to get a redacted transcript from a conversation. This request requires a conversation ID. You receive a conversation ID when you process a conversation with the Symbl.ai APIs.

After processing a conversation, you need to generate a redacted transcript using the Conversations API Get messages operation with the redact=true parameter.

Note: Redaction is unavailable in the streaming API, meaning that messages will not be redacted in streaming API response. However, it is possible to obtain redacted transcripts of streamed conversations by using the conversation API.

To get redacted content, use the following operation:

GET https://api.symbl.ai/v1/conversations/{conversationId}/messages?redact=true

To complete this request from the API reference, see Get messages.

Apply redaction to conversation insights

This section describes how to get insights generated from conversation APIs redacted. Redaction for conversation insights is supported for formatted transcript, summary, trackers, bookmarks, action-items, follow-ups, questions, and topics. To generate redacted insights from the Conversations APIs, add the redact=true parameter in the request. You can also set redaction for all insights by default by updating the user redaction settings. To update your redaction settings, see settings for redaction

For example, to get redacted summary content, use the following operation:

GET https://api.symbl.ai/v1/conversations/{conversationId}/summary?redact=true

Redacted output options

You can change the characters to display in place of redacted output. Use the Management API User settings to customize output options, briefly described here. For details, see Settings for redaction.

You can choose one of three redaction options for the redacted output:

  • Default: replace the redacted entity with the name of the entity type, such as [SSN]. For the complete list of entity type names, see Supported entities.

  • Obfuscation: replace the redacted entity with four asterisks [****], regardless of entity length.

  • Custom: replace the redacted entity with any custom string you define, such as [123-34-6789] for social security number.

The following table shows the same sentence redacted in each of the three different formats:

FormatExample redaction
OriginalMy social security number is 444-67-3334 and I live in New York City.
DefaultMy social security number is [SSN] and I live in [LOCATION_CITY].
ObfuscationMy social security number is [****] and I live in [****].
CustomMy social security number is [123-45-6789] and I live in [City].

Custom redaction

By default, all supported entities are redacted from the transcript with redact=true. You can customize which entities not to redact from the list of Supported entities. Using the exclude=[] parameter, you can redact all entities except those you specify.

In this example, a customer working in market research wants to retain all the cities from a dataset while redacting other sensitive content. The example shows how to keep the actual city name, but redact all other sensitive content.

Note: Custom redaction is best for excluding a limited number of entities because each excluded entity type must be manually included in the request URL. Each entity to be excluded must be in ALL_CAPS with an underscore between words.

Use the exclude parameter to achieve custom redaction:

To redact all entities except LOCATION_CITY, use exclude=["LOCATION_CITY"] AND redact=true.

Example request:

GET https://api.symbl.ai/v1/conversations/{conversationId}/messages?exclude=["LOCATION_CITY"]&redact=true

Types of custom redaction:

  • Original message: My social security number is 444-67-3334 and I live in New York City.

  • Default redaction with redact=true: My social security number is [SSN] and I live in [LOCATION_CITY].

  • With exclude=["LOCATION_CITY"]&redact=true, the redacted message: My social security number is [SSN] and I live in New York City.