Identify and Redact PII and PCI (Labs)

Use the Symbl.ai Conversation API to identify and redact personally identifiable information.

Personally Identifiable Information (PII) is any information about an individual that can be used to distinguish or trace the individual's identity, such as full name, home address, social security number, email address, phone number, passport number, driver's license number, and so on.

Payment Card Industry (PCI) data includes sensitive financial information such as credit card and debit card numbers, cardholder's full name, card expiration date, card security code, bank account numbers, and so on.

Sensitive information can appear in conversations such as account verification by customer care agents that requires the customer to share their name, email address, or other confidential information.

Symbl.ai can help you identify and redact PII and PCI data from conversations and insights. Redacting PII and PCI data is a process of concealing confidential information in both messages and insights.

Use this technology to:

  • Identify any PII / PCI data from messages and insights. See the complete list of Supported PII / PCI data.
  • Redact PII / PCI data in the messages and transcripts with a default masking redaction indicator "****".
  • Mask with custom string instead of the default redaction indicator.

A list of supported PII data that Symbl can identify and redact are given in Supported PII / PCI data section.

Currently, the PII and PCI data identification and redaction is only supported for English for the Streaming API.

Authentication

Before using this API, you must generate your authentication token (AUTH_TOKEN) as described in Authentication.

Identifying and Redacting PII and PCI data

To enable PII / PCI support for messages objects, provide additional payload in the request body for Streaming API.

For WebSocket request using Streaming API, add the redaction object in the start_request message to begin the real-time PII / PCI identification and redaction while starting the connection to the Streaming API.

// sample payload of start_request message
{
    "type": "start_request",
    "config": {
        "languageCode": "en-US",
        "redaction": {
            // Enable identification of PII/PCI information
            "identifyContent": true, // By default false
            // Enable redaction of PII/PCI information
            "redactContent": true, // By default false
            // Use custom string "[PII_PCI_ENTITY]" to replace PII/PCI information with
            "redactionString": "[PII_PCI_ENTITY]" // By default ****
        }
    },
}

Live speech-to-text and insights on local server

For SDK client, add payload in the sdk.startRealtimeRequest method.

/**
 * To Test this script - start speaking when you run the script.
 */
const WebSocketClient = require('websocket').client;

const mic = require('mic');

const micInstance = mic({
    rate: '16000',
    channels: '2',
    debug: false,
    exitOnSilence: 6
});

const micInputStream = micInstance.getAudioStream();

let connection = undefined;

const ws = new WebSocketClient();
ws.on('connectFailed', (e) => {
    console.error('Connection Failed.', e);
});
ws.on('connect', (conn) => {

    connection = conn;
    connection.on('close', () => {
        console.log('WebSocket closed.')
    });
    connection.on('error', (err) => {
        console.log('WebSocket error.', err)
    });
    connection.on('message', (data) => {
        let response = JSON.stringify(data);
        let utf8Data = JSON.parse(data["utf8Data"]);
        //console.log(utf8Data);
        if (utf8Data && utf8Data.type === "message_response") {
            console.log("Payload ====");
            console.log(utf8Data.messages[0].payload);
            console.log("Entities ====");
            console.log("metadata ====");
            console.log(utf8Data.messages[0].metadata);
            if (utf8Data.messages[0].entities) {
                console.log("Entities found ");
                console.log(utf8Data.messages[0].entities);
            } else {
                console.log("No entities found ");
            }
        }

        // console.log('data: ', data);
    });
    console.log('Connection established.');

    connection.send(JSON.stringify({
        type: "start_request",
        insightTypes: ["action_item", "question", "follow_up", "topic"],
        config: {
            confidenceThreshold: 0.1,
            timezoneOffset: 480, // Offset in minutes from UTC
            languageCode: 'en-US',
            speechRecognition: {
                engine: "google",
                encoding: "LINEAR16",
                sampleRateHertz: 44100,
            },
            // this option enables redaction PII/PCI feature. This is optional
            redaction: {
                identifyContent: true, // By default false
                redactContent: true, // By default false
                redactionString: '*****' // By default ****
            },
        },
        speaker: {
            userId: "[email protected]",
            name: "John"
        },
        trackers: [{
            name: 'Budget',
            vocabulary: [
                'a budget conversation',
                'budget', 'budgeted', 'budgeting decision', 'budgeting decisions',
                'money',
                'budgets', 'funding', 'funds', 'I have the budget', 'my budget', 'our budget', 'your budget',
                "we don't have budget for this", "don't think I have budget", "I think we have budget",
                "not sure if I have budget"
            ]
        },
            {
                name: 'Approval',
                vocabulary: ['sounds great', 'yes', 'okay, sounds good', "agree", "yeah"],
            },
            {
                name: 'Denial',
                vocabulary: ['No', 'Not necessary', 'Not a good idea', "don't agree"],
            }

        ]

    }));

    micInputStream.on('data', function(data) {
        connection.send(data);
    });

    // below action can stop meeting
    setTimeout(() => {
        micInstance.stop();
        connection.sendUTF(JSON.stringify({
            "type": "stop_recognition"
        }));
    }, 4 * 40 * 1000);

    micInstance.start();

});
// Use auth token here and point to correct server
ws.connect('wss://api-labs.symbl.ai/v1/realtime/insights/MeetingID', null, null, {
    'x-api-key': ""
});

Field NameData TypeDescriptionRequiredDefault valueAllowed values
identifyContentBooleanSpecifies that the PII / PCI data or sensitive content should be identified.Mandatoryfalsetrue or false.
redactContentBooleanSpecifies that the PII / PCI data or sensitive content should be redacted in the transcript and insights.Mandatoryfalsetrue or false.
redactionStringStringSpecifies any specific string to be used to replace redacted entities.Optional****Min length 1 character, Max length 16 characters.

Response Body Samples

The response returned for PII / PCI data Identification and Redaction can be one of the following 3 scenarios depending on how you have set up the two mandatory parameters identifyContent and redactContent.

When identifyContent=true and redactContent=false:

PII / PCI data or sensitive content is identified and made available in the message and insight objects, but the content of transcript and insight is not redacted, and still shows the sensitive content.

{
  "id": "6412283618000896",
  "text": "Sure, my social security number is 222-44-5555 and I was born on 3rd of October 1982.",
  "from": {
      "name": "Roger",
      "email": "[email protected]"
  },
  "startTime": "2020-07-10T11:16:21.024Z",
  "endTime": "2020-07-10T11:16:26.724Z",
  "conversationId": "6749556955938816",
  "entities": [
    {
      "type": "SSN",
      "value": "22-44-5555",
      "text": "22-44-5555",
      "offset": 35
    },
    {
      "type": "DATE_OF_BIRTH",
      "value": "1989-10-03",
      "text": "3rd of October 1982",
      "offset": 65
    }
  ]
}

When identifyContent=false and redactContent=true:

PII / PCI data or sensitive content is not made available in the message and insight objects, but the content of transcript and insight is redacted, and replaced with the redaction indicator.

 {
  "id": "6412283618000896",
  "text": "Sure, my social security number is **** and I was born on ****.",
  "from": {
    "name": "Roger",
    "email": "[email protected]"
  },
  "startTime": "2020-07-10T11:16:21.024Z",
  "endTime": "2020-07-10T11:16:26.724Z",
  "conversationId": "6749556955938816"
}

When identifyContent=true and redactContent=true:

PII / PCI data or sensitive content is identified and made available in the message and insight objects, and the content of transcript and insight is redacted and replaced with the redaction indicator.

 {
  "id": "6412283618000896",
  "text": "Sure, my social security number is **** and I was born on ****.",
  "from": {
      "name": "Roger",
      "email": "[email protected]"
  },
  "startTime": "2020-07-10T11:16:21.024Z",
  "endTime": "2020-07-10T11:16:26.724Z",
  "conversationId": "6749556955938816",
  "entities": [
    {
      "type": "SSN",
      "value": "22-44-5555",
      "text": "22-44-5555",
      "offset": 35
    },
    {
      "type": "DATE_OF_BIRTH",
      "value": "1989-10-03",
      "text": "3rd of October 1982",
      "offset": 59
    }
  ]
}

Sample with Conversation Response

The following sample provides PII / PCI data Identification and Redaction using Get Conversation Insights request with questions.

GET https://api.symbl.ai/v1/conversations/CONVERSATION_ID/questions

Response with enabled redaction configuration:

{
    "questions": [
        {
            "id": "5845327161589760",
            "text": "What would that be in  ***** ?",
            "type": "question",
            "score": 0.97763958889475,
            "messageIds": [
                "6371692449366016"
            ],
            "from": {
                "id": "6af1e59a-4824-476a-b2ad-2908cb38c2f4",
                "name": "john",
                "userId": "[email protected]"
            }
        },
        {
            "id": "6228200813232128",
            "text": "But, how did you know, I was  ***** ?",
            "type": "question",
            "score": 0.9919246735147544,
            "messageIds": [
                "4505288909520896"
            ],
            "from": {
                "id": "6af1e59a-4824-476a-b2ad-2908cb38c2f4",
                "name": "john",
                "userId": "[email protected]"
            }
        }
    ]
}

Get Conversation Insights without Redaction configuration

The following sample provides PII / PCI data Identification without Redaction using Get Conversation Insights request with questions.

GET https://api.symbl.ai/v1/conversations/CONVERSATION_ID/questions

Response with disabled redaction configuration:

{
    "questions": [
        {
            "id": "5216057812844544",
            "text": "But, how did you know, I was English?",
            "type": "question",
            "score": 0.9919246735147544,
            "messageIds": [
                "4913495486234624"
            ],
            "from": {
                "id": "8081de20-b855-46ad-b2d7-fb04774d9d67",
                "name": "john",
                "userId": "[email protected]"
            }
        },
        {
            "id": "5912470251110400",
            "text": "What would that be in England?",
            "type": "question",
            "score": 0.97763958889475,
            "messageIds": [
                "6334186244800512"
            ],
            "from": {
                "id": "8081de20-b855-46ad-b2d7-fb04774d9d67",
                "name": "john",
                "userId": "[email protected]"
            }
        }
    ]
}

Supported PII and PCI data

PII / PCI DataCategoryDescription
Credit/Debit Card NumberFinanceA credit or debit card number is 12 to 19 digits long, used for payment transactions.
Credit/Debit Card CVV NumberFinanceA 3-digit or 4-digit security code of a credit or debit card.
Credit/Debit Card Expiration DateFinanceThe month and year a card expires.
Credit/Debit Card PIN.FinanceA security code issued by a bank or credit union for authenticating the transaction. Not to be confused with CVV code.
IBAN CodeFinanceAn International Bank Account Number (IBAN) is an international system for identifying bank accounts across national borders. It defined under the ISO-13616:2007 standard. An IBAN consists of up to 34 alphanumeric characters.
SWIFT CodeFinanceA SWIFT code is a unique identification code for a particular bank. These codes are used when transferring money between banks, particularly for international wire transfers.
US Bank Routing NumberFinanceThe American Bankers Association (ABA) Routing Number (also called the transit number) is a nine-digit code. It is used to identify the financial institution that's responsible to credit or entitled to receive credit for a check or electronic transaction.
US Bank Account NumberFinanceUS Bank Account Number.
NamePersonalA person's full name, which can include first names, middle names or initials, and last names.
EmailPersonalAn email address to a mailbox.
AgePersonalAge measured in months or years.
Phone Number, Address, Date of BirthPersonalPhone number, address, and date of birth.
Social Security NumberNational IDA United States Social Security number (SSN) is a 9-digit number issued to US citizens, permanent residents, and temporary residents.
Passport NumberNational IDA passport number.
US Drivers License NumberNational IDA driver's license number for the United States. Format can vary depending on the issuing state.
DateGeneralDetects date mentions, including the names of common world holidays.
Domain NameGeneralA domain name as defined by DNS standard.

Did this page help you?