Google DLP

Most recent version: v0.0.1

See the changelog of this Action type here.

Overview

The Google DLP Action is designed to integrate with Google's Data Loss Prevention (DLP) API. This Action allows detecting and classifying sensitive information, enabling workflows to comply with data protection requirements.

This Action does not generate new events. Instead, it processes incoming events to detect sensitive information based on the configured Info Types and returns the corresponding findings.

In order to configure this action, you must first link it to a Listener. Go to Building a Pipeline to learn how this works.

Ports

These are the input and output ports of this Action:

Input ports

Default port - All the events to be processed by this Action enter through this port.

Output ports

Error port - Events are sent through this port if an error occurs while processing them.

Configuration

Find Google DLP in the Actions tab (under the Advanced group) and drag it onto the canvas.

To open the configuration, click the Action in the canvas and select Configuration.

Enter the required parameters:

Parameter

Description

Info Types*

Type(s) of sensitive data to detect. You can choose as many types as needed.

Data to Inspect*

Choose the input field that contains the data to be inspected by the DLP API.

JSON credentials*

JSON object containing the credentials required to authenticate with the Google DLP API.

Output Field*

Name of the new field where the results of the DLP evaluation will be stored.

Minimum Likelihood

For each potential finding that is detected during the scan, the DLP API assigns a likelihood level. The likelihood level of a finding describes how likely it is that the finding matches an Info Type that you're scanning for. For example, it might assign a likelihood of Likely to a finding that looks like an email address.

The API will filter out any findings that have a lower likelihood than the minimum level that you set here.

The available values are:

Very Unlikely
Unlikely
Possible (This is the default value)
Likely
Very Likely

For example, if you set the minimum likelihood to Possible, you get only the findings that were evaluated as Possible, Likely, and Very likely. If you set the minimum likelihood to Very likely, you get the smallest number of findings.

Include Quote

If true, includes a contextual quote from the data that triggered a finding. The default value is true.

Exclude Info Types

If true, excludes type information of the findings. The default value is false.

Click Save to complete the process.

Example

Imagine you want to ensure that logs sent to a third-party service do not contain sensitive information such as credit card numbers, personal identification numbers, or passwords. To do it:

Add the Google DLP Action to your Pipeline and link it to your required Data sink.

Now, double-click the Google DLP Action to configure it. You need to set the following config:

Parameter

Description

Info Types

Choose the following info types:

Credit Card Number
Email Address
Password

Data to Inspect

Choose the input field that contains the data to be inspected by the DLP API.

JSON credentials

JSON object containing the credentials required to authenticate with the Google DLP API.

Output Field

Name of the new field where the results of the DLP evaluation will be stored.

Minimum Likelihood

We set the likelihood to Possible, as we want the right balance between recall and precision.

Include Quote

We want contextual info of the findings, so we set this to true.

Exclude Info Types

Set this to true, as we want to include type information of the findings.

Click Save to apply the configuration.

Now link the Default output port of the Action to the input port of your Data sink.

Finally, click Publish and choose in which clusters you want to publish the Pipeline.

Click Test pipeline at the top of the area and choose a specific number of events to test if your data is transformed properly. Click Debug to proceed.

This is the input data field we chose for our analysis:

{
  "Info": "My credit card number is 4111-1111-1111-1111"
}

And this is a sample output data with the corresponding results of the DLP API:

{
  "dlpFindings": {
    "findings": [
      {
        "infoType": "CREDIT_CARD_NUMBER",
        "likelihood": "VERY_LIKELY",
        "quote": "4111-1111-1111-1111"
      }
    ]
  }
}

PreviousFor Each NextHTTP Request

Last updated 3 months ago

Was this helpful?