Collect data from Azure Blob Storage

Most recent version: v0.0.1

circle-info

See the changelog of the Azure Blob Storage Listener here.

Overview

Onum supports integration wit Azure Blob Storagearrow-up-right.

The Azure Blob Storage Listener connects to your Azure Storage account and detects when new files are uploaded. It works by monitoring an Azure Storage Queue that receives notifications from Azure Event Grid whenever a blob is created. The Listener then retrieves the file content and makes it available for processing in your workflows.

Prerequisites

Depending on your authentication method, you'll need the following permissions:

  • Connection String: Storage account access key

  • Service Principal: Azure AD application with these assigned roles:

    • Storage Blob Data Reader (minimum)

    • Storage Queue Data Contributor (minimum)

Azure Blob Storage Setup

You'll need to set up the following resources:

  • An Azure Storage Account with:

    • An Blob Storage container (where files will be uploaded)

    • A Storage Queue (to receive notifications)

  • An Azure Event Grid Subscription configured to:

    • Monitor your Blob Storage container

    • Send BlobCreated events to your Storage Queue

    • Filter for BlockBlob creation events only

Onum Setup

1

Log in to your Onum tenant and click Listeners > New listener.

2

Double-click the Azure Blob Storage Listener.

3

Enter a Name* for the new Listener. Optionally, add a Description and some Tags to identify the Listener.

4

In the Authentication section, choose between:

chevron-rightConnection Stringhashtag

Use your storage account's connection string as your authentication method. This method is straightforward but requires managing the connection string securely.

Follow these steps to get your connection string:

  1. Click your Event Hubs namespace to view the Hubs it contains.

  2. Scroll down to the bottom and click the specific event hub to connect to.

  3. In the left menu, go to Shared Access Policies.

  4. If there is no policy created for an event hub, create one with Manage, Send, or Listen access.

  5. Select the policy from the list.

  6. Select the copy button next to the Connection string-primary key field. Depending on the version of Azure you are using, the corresponding field may have a different name, so to help you find it, look for a string with the same format:

Endpoint=sb://.servicebus.windows.net/; SharedAccessKeyName=RootManageSharedAccessKey; SharedAccessKey=

Now that you got it, open the Connection String* field and click New secret. In the window that appears, give your secret a Name* and turn off the Expiration date toggle if not needed. Then, click Add new value and paste the connection string. Click Save when you're done.

Now, select the token you have just created in the Connection String* field.

circle-info

Learn more about secrets in this articlearrow-up-right.

chevron-rightClient Secrethashtag

Use Azure Active Directory authentication with a registered application and client secret. This provides better security and access control. We recommend to use this method for production environments and multi-tenant applications.

Enter your Storage Account Name* and get the following credentials from the Certificates & Secretsarrow-up-right area:

  • Tenant ID* - Azure AD tenant identifier.

  • Client ID* - Azure AD application (service principal) identifier.

  • Client Secret* - Secret key for your service principal. To add it, open the field and click New secret. In the window that appears, give your secret a Name* and turn off the Expiration date toggle if not needed. Then, click Add new value and paste your client secret. Click Save when you're done. Now, select the token you have just created in the Client Secret* field.

circle-info

Learn more about secrets in this articlearrow-up-right.

chevron-rightCertificatehashtag

Use Azure Active Directory authentication with a certificate instead of a secret. This is the most secure option. We recommend to use this method for high-security production environments and compliance requirements.

Enter your Storage Account Name* and get the following credentials from the Certificates & Secretsarrow-up-right area:

  • Tenant ID* - Azure AD tenant identifier.

  • Client ID* - Azure AD application (service principal) identifier.

  • Certificate* - PEM-encoded certificate with private key Open the field and click New secret. In the window that appears, give your secret a Name* and turn off the Expiration date toggle if not needed. Then, click Add new value and paste your certificate. Click Save when you're done. Now, select the token you have just created in the Certificate* field.

circle-info

Learn more about secrets in this articlearrow-up-right.

5

In the Retry Configuration section, set the maximum number of attempts a failed Azure read should be retried (Max Retries*) and the wait time before sending the next request after the last response was received and empty (Idle Backoff Time*).

6

In the Queue Configuration section, enter the Queue Name* of the queue that is receiving blob events.

7

In the Limit & Timeout* section, enter the following:

  • Message Limit* - Number of messages to retrieve per polling cycle. The minimum value is 1, and the maximum value is 32.

  • Visibility Timeout* - Number of seconds messages should stay hidden from other consumers while processing. The minimum value is 1, and the maximum value is 604,800 (7 days).

8

In the Advanced configuration section, you can optionally configure the following:

  • Event delimiter - Split file content into multiple messages using a delimiter. The default value is \n for line-by-line processing.

  • Use compression - Activate this toggle if you want to listen for compressed files. Choose between Auto, Gzip or Bzip2.

9

Finally, click Create labels. Optionally, you can set labels to be used for internal Onum routing of data. By default, data will be set as Unlabeled. Click Create listener when you're done.

circle-info

Learn more about labels in this article.

10

Click Create listener when you're done.

Last updated

Was this helpful?