Google DLP

Most recent version: v0.0.1

See the changelog of this Action type here.

Overview

The Google DLP Action is designed to integrate with Google's Data Loss Prevention (DLP) API. This Action allows detecting and classifying sensitive information, enabling workflows to comply with data protection requirements.

This Action does not generate new events. Instead, it processes incoming events to detect sensitive information based on the configured Info Types and returns the corresponding findings.

Ports

These are the input and output ports of this Action:

Input ports
  • Default port - All the events to be processed by this Action enter through this port.

Output ports
  • Error port - Events are sent through this port if an error occurs while processing them.

Configuration

1

Find Google DLP in the Actions tab (under the Advanced group) and drag it onto the canvas.

2

To open the configuration, click the Action in the canvas and select Configuration.

3

Enter the required parameters:

Parameter
Description

Info Types*

Type(s) of sensitive data to detect. You can choose as many types as needed.

Data to Inspect*

Choose the input field that contains the data to be inspected by the DLP API.

JSON credentials*

JSON object containing the credentials required to authenticate with the Google DLP API.

Output Field*

Name of the new field where the results of the DLP evaluation will be stored.

Minimum Likelihood

For each potential finding that is detected during the scan, the DLP API assigns a likelihood level. The likelihood level of a finding describes how likely it is that the finding matches an Info Type that you're scanning for. For example, it might assign a likelihood of Likely to a finding that looks like an email address.

The API will filter out any findings that have a lower likelihood than the minimum level that you set here.

The available values are:

  • Very Unlikely

  • Unlikely

  • Possible (This is the default value)

  • Likely

  • Very Likely

For example, if you set the minimum likelihood to Possible, you get only the findings that were evaluated as Possible, Likely, and Very likely. If you set the minimum likelihood to Very likely, you get the smallest number of findings.

Include Quote

If true, includes a contextual quote from the data that triggered a finding. The default value is true.

Exclude Info Types

If true, excludes type information of the findings. The default value is false.

4

Click Save to complete the process.

Example

Imagine you want to ensure that logs sent to a third-party service do not contain sensitive information such as credit card numbers, personal identification numbers, or passwords. To do it:

1

Add the Google DLP Action to your Pipeline and link it to your required Data sink.

2

Now, double-click the Google DLP Action to configure it. You need to set the following config:

Parameter
Description

Info Types

Choose the following info types:

  • Credit Card Number

  • Email Address

  • Password

Data to Inspect

Choose the input field that contains the data to be inspected by the DLP API.

JSON credentials

JSON object containing the credentials required to authenticate with the Google DLP API.

Output Field

Name of the new field where the results of the DLP evaluation will be stored.

Minimum Likelihood

We set the likelihood to Possible, as we want the right balance between recall and precision.

Include Quote

We want contextual info of the findings, so we set this to true.

Exclude Info Types

Set this to true, as we want to include type information of the findings.

3

Click Save to apply the configuration.

4

Now link the Default output port of the Action to the input port of your Data sink.

5

Finally, click Publish and choose in which clusters you want to publish the Pipeline.

6

Click Test pipeline at the top of the area and choose a specific number of events to test if your data is transformed properly. Click Debug to proceed.

This is the input data field we chose for our analysis:

{
  "Info": "My credit card number is 4111-1111-1111-1111"
}

And this is a sample output data with the corresponding results of the DLP API:

{
  "dlpFindings": {
    "findings": [
      {
        "infoType": "CREDIT_CARD_NUMBER",
        "likelihood": "VERY_LIKELY",
        "quote": "4111-1111-1111-1111"
      }
    ]
  }
}

Last updated

Was this helpful?