Collect data from Google Cloud Storage

circle-info

See the changelog of the Google Cloud Storage Listener here.

circle-exclamation

Overview

Onum supports integration with Google Cloud Storagearrow-up-right.

Google Cloud Storage is an online object storage service that allows users to store and retrieve data. It is a managed service, meaning Google handles the underlying infrastructure, making it scalable and reliable. GCS is designed for a variety of use cases, including storing data for web applications, big data analytics, and backups.

Select Google Cloud Storage from the list of Listener types and click Configuration to start.

Prerequisites

circle-exclamation

Google Cloud Storage Setup

To source data from Google Cloud Storage you need to have a GCS bucket with data, appropriate permissions (like Storage Admin) to access the bucket and its objects, and the correct resource path (e.g., gs://bucket-name/object-name).

See the Google Cloud Storage manualarrow-up-right for help.

Onum Setup

1

Log in to your Onum tenant and click Listeners > New listener.

2

Double-click the Google Cloud Storage Listener.

3

Enter a Name* for the new Listener. Optionally, add a Description and some Tags to identify the Listener.

4

The Google Cloud connector uses OAuth 2.0 credentials for authentication and authorization. In the Credentials file* field, create a new Secret containing these credentials or select one already created. To get it:

  1. To find the Google Cloud credentials file, go to Settings > Interoperability.

  2. Scroll down to the Service Account area.

  3. You need to generate and download a service account key from the Google Cloud Console. You will not be able to view this key, so you must have it copied somewhere already. Otherwise, create one here and save it to paste here.

  4. To see existing Service Accounts, go to the menu in the top left and select APIs & Services > Credentials.

circle-info

Learn more about secrets in Onum in this article.

5

Assign an optional Event delimiter to split file content into different events using a delimiter (Examples: -, \n, \r\n, 0x0A...).

6

Choose the Compression type* for your files (None, Gzip, Bzip2 or Auto).

7

If you set the Read Bucket Once parameter to true, the Listener will read the entire bucket once and stop the execution. You'll be prompted to enter the following:

  • Prefix - The optional string that acts like a folder path or directory structure when organizing objects within a bucket.

  • Bucket* - Enter the GCP bucket name.

  • Start at* - This will block the Listener from starting until this timestamp. The required date format is DD/MM/YYYY HH:mm. The specified time must be in the future and conform to the timezone where the operation is being executed.

8

The Project ID* is a unique string with the following format: my-project-123456. To get it:

  1. Go to the Google Cloud Console.

  2. In the top left corner, click on the project drop-down next to the Google Cloud logo (where your current project name is shown).

  3. Each project will have a Project Name and a Project ID.

  4. You can also find it in the Settings tab on the left-hand side.

9

Enter your Subscription (called Subscription ID in the Cloud Console). Follow these steps to get it:

  1. Go to Pub/Sub in the Google Cloud Console.

  2. In the top left corner, click on the menu and select View all Products.

  3. Then go to Analytics and find Pub/Sub. Click it to go to Pub/Sub (you can also use the search bar and type Pub/Sub).

  4. In the Pub/Sub dashboard, select the Subscriptions tab on the left.

  5. The Subscription ID will be displayed in this list.

10

In case of a failure to connect, enter the following parameters:

  • Number of retries* - Enter the maximum number of retries to perform in case of a failure. The minimum value is 1, and the maximum value is 5. The default value is 3.

  • Retry delay* - Enter the number of milliseconds to wait between retries. The minimum and default value is 100, and the maximum value is 1000.

11

Finally, click Create labels. Optionally, you can set labels to be used for internal Onum routing of data. By default, data will be set as Unlabeled.

circle-info

Learn more about labels in this article.

12

Click Create listener when you're done.

Output Ports

The Google Cloud Storage Listener has two output ports:

  • Default port - Events are sent through this port if no error occurs while processing them.

  • Error port - Events are sent through this port if an error occurs while processing them.

Last updated

Was this helpful?