Collect data from Google Cloud Storage

Most recent version: v1.0.1

See the changelog of this Listener type here.

Overview

Onum supports integration with Google Cloud Storage.

Google Cloud Storage is an online object storage service that allows users to store and retrieve data. It is a managed service, meaning Google handles the underlying infrastructure, making it scalable and reliable. GCS is designed for a variety of use cases, including storing data for web applications, big data analytics, and backups.

Google Cloud Storage Setup

To source data from Google Cloud Storage you need to have a GCS bucket with data, appropriate permissions (like Storage Admin) to access the bucket and its objects, and the correct resource path (e.g., gs://bucket-name/object-name).

See the Google Cloud Storage manual for help.

Onum Setup

1

Log in to your Onum tenant and click Listeners > New listener.

2

Double-click the Google Cloud Storage Listener.

3

Enter a Name for the new Listener. Optionally, add a Description and some Tags to identify the Listener.

4

The Google Cloud connector uses OAuth 2.0 credentials for authentication and authorization. Create a new Secret containing these credentials or select one already created. To get it:

  1. To find the Google Cloud credentials file, go to Settings > Interoperability.

  2. Scroll down to the Service Account area.

  3. You need to generate and download a service account key from the Google Cloud Console. You will not be able to view this key, so you must have it copied somewhere already. Otherwise, create one here and save it to paste here.

  4. To see existing Service Accounts, go to the menu in the top left and select APIs & Services > Credentials.

5

Assign an optional Event Delimiter to simulate a hierarchical directory structure within a flat namespace.

6

Choose the compression type for your files (None, Gzip, Bzip2 or Auto).

7

If you set the Read Bucket Once parameter to true, the Listener will read the entire bucket once and stop the execution. You'll be prompted to enter the following:

  • Prefix - The optional string that acts like a folder path or directory structure when organizing objects within a bucket.

  • Bucket* - Enter the GCP bucket name.

  • Start at* - This will block the Listener from starting until this timestamp. The required date format is DD/MM/YYYY HH:mm.

8

The Project ID* is a unique string with the following format: my-project-123456. To get it:

  1. Go to the Google Cloud Console.

  2. In the top left corner, click on the project drop-down next to the Google Cloud logo (where your current project name is shown).

  3. Each project will have a Project Name and a Project ID.

  4. You can also find it in the Settings tab on the left-hand side.

9

Enter your subscription name. Follow these steps to get it:

  1. Go to Pub/Sub in the Google Cloud Console.

  2. In the top left corner, click on the menu and select View all Products.

  3. Then go to Analytics and find Pub/Sub. Click it to go to Pub/Sub (you can also use the search bar and type "Pub/Sub").

  4. In the Pub/Sub dashboard, select the Subscriptions tab on the left.

  5. The Subscription Name will be displayed in this list.

10

In case of a failure to connect, enter the following parameters:

  • Number of retries* - Enter the maximum number of retries to perform in case of a failure. The minimum value is 1, and the maximum value is 5. The default value is 3.

  • Retry delay* - Enter the number of milliseconds to wait between retries. The minimum and default value is 100, and the maximum value is 1000.

11

Finally, click Create labels. Optionally, you can set labels to be used for internal Onum routing of data. By default, data will be set as Unlabeled. Click Create listener when you're done.

Click Create listener when you're done.

Last updated

Was this helpful?