Collect data from Amazon S3
Most recent version: v1.0.0
Overview
The following article outlines a basic data flow from Amazon Simple Storage Service (S3) to the Onum Amazon S3 Listener.
This is a Pull Listener and therefore should not be used in environments with more than one cluster.
Prerequisites
Before configuring and starting to send data with the Amazon S3 Listener, you need to take into consideration the following requirements:
Your Amazon user needs at least permission to use the
GetObject
operation (S3) and theReceiveMessage
andDeleteMessageBatch
operations (SQS Bucket) to make this Listener work.Cross-Region Configurations: Ensure that your S3 bucket and SQS queue are in the same AWS Region, as S3 event notifications do not support cross-region targets.
Permissions: Confirm that the AWS Identity and Access Management (IAM) roles associated with your S3 bucket and SQS queue have the necessary permissions.
Object Key Name Filtering: If you use special characters in your prefix or suffix filters for event notifications, ensure they are URL-encoded.
Amazon S3 Setup
You need to configure your Amazon S3 bucket to send notifications to an Amazon Simple Queue Service (SQS) queue when new files are added.
Modify the SQS Queue Policy to Allow S3 to Send Messages
In the Amazon SQS console, select your queue.
Navigate to the Access Policy tab and choose Edit.
Replace the existing policy with the following, ensuring you update the placeholders with your specific details:
{
"Version": "2012-10-17",
"Id": "S3ToSQSPolicy",
"Statement": [
{
"Sid": "AllowS3Bucket",
"Effect": "Allow",
"Principal": {
"Service": "s3.amazonaws.com"
},
"Action": "SQS:SendMessage",
"Resource": "arn:aws:sqs:<region>:<account-id>:<queue-name>",
"Condition": {
"ArnLike": {
"aws:SourceArn": "arn:aws:s3:::<bucket-name>"
},
"StringEquals": {
"aws:SourceAccount": "<account-id>"
}
}
}
]
}
Save the changes. This policy grants your S3 bucket permission to send messages to your SQS queue.
Configure S3 Event Notifications
Open the Amazon S3 console and select the bucket you want to configure.
Go to the Properties tab and find the "Event notifications" section.
Click on Create event notification.
Provide a descriptive name for the event notification.
In the Event types section, select All object create events or specify particular events that should trigger notifications.
In the Destination section, choose SQS Queue and select the queue you configured earlier.
Save the configuration.
Onum Setup
Log in to your Onum tenant and click Listeners > New listener.


Double-click the AWS S3 Listener.


Enter a Name for the new Listener. Optionally, add a Description and some Tags to identify the Listener.
In the Objects section, enter the required Compression method used in the ingested S3 files and Format of the ingested S3 files.
Compression - This accepts the standard compression codecs (gzip, zlib, bzip2), none for no compression, and auto to autodetect the compression type from the file extension.
Format - This currently accepts JSON array (a big JSON array containing a JSON object for each event), JSON lines (a JSON object representing an event on each line), and auto to autodetect the format from the file extension (.json or .jsonl, respectively).


Define the Bucket to listen from.
Region* - Find this in your Buckets area, next to the name.

Name - The AWS bucket your data is stored in. This is the bucket name found in your Buckets area. You can fill this if you want to check that notifications come from that bucket, or leave it empty to avoid such checks.
Authentication Type*- Choose manual to enter your access key ID and secret access key manually in the parameters below, or auto to authenticate automatically. The default value is manual.
Access key ID*- Select the access key ID from your Secrets or click New secret to generate a new one.
The Access Key ID is found in the IAM Dashboard of the AWS Management Console.
In the left panel, click on Users.
Select your IAM user.
Under the Security Credentials tab, scroll to Access Keys, and you will find existing Access Key IDs (but not the secret access key).
Secret access key*- Select the secret access key from your Secrets or click New secret to generate a new one. Under Access keys, you can see your Access Key IDs, but AWS will not show the Secret Access Key. You must have it saved somewhere. If you don't have the secret key saved, you need to create a new one.


Proceed with caution when modifying the Bucket advanced options. Default values should be enough in most cases.
Optionally, Amazon S3 provides different types of service endpoints based on the region and access type.
Select your bucket.
Go to the Properties tab.
Under Bucket ARN & URL, find the S3 endpoint URL.
Amazon Service Endpoint will usually be chosen automatically, so you should not normally have to fill this up. However, in case you need to override the default access point, you can do it here.


In the Queue section, choose the region your queue is created in from the dropdown provided.
Then, enter the URL of your existing Amazon SQS queue to send the data to.
Go to the AWS Management Console.
In the Search Bar, type SQS and click on Simple Queue Service (SQS).
Click on Queues in the left panel.
Locate your queue from the list and click it.
The Queue URL will be displayed in the table under URL.

This is the correct URL format: https://sqs.region.localhost/awsaccountnumber/storedinenvvar
Choose your Authentication Type*
Choose manual to enter your access key ID and secret access key manually in the parameters below, or auto to authenticate automatically.
If you have configured your bucket and queue to require different Access Key IDs and Secret Access Keys, enter them here. If these are the same as your bucket, you don't need to repeat them here.


Proceed with caution when modifying the Queue advanced options. Default values should be enough in most cases.
Service endpoint - If you have a custom endpoint, enter it here. The default SQS regional service endpoint will be used by default.
Maximum number of messages* - Set a limit for the maximum number of messages to receive in the notifications queue for each request. The minimum value is
1
, and the maximum and default value is10
.Visibility timeout* - Set how many seconds to leave a message as hidden in the queue after being delivered, before redelivering it to another consumer if not acknowledged. The minimum value is
30s
, and the maximum value is12h
. The default value is1h
.Wait time*- When the queue is empty, set how long to wait for messages before deeming the request as timed out. The minimum value is
5s
, and the maximum and default value is20s
.


Proceed with caution when modifying the General advanced options. Default values should be enough in most cases.
Event batch size*- Enter a limit for the number of events allowed through per batch. The minimum value is
1
, and the maximum and default value is1000000
.Minimum retry time* - Set the minimum amount of time to wait before retrying. The default and minimum value is
1s
, and the maximum value is10m
.Maximum retry time* - Set the maximum amount of time to wait before retrying. The default value is
5m
, and the maximum value is10m
. The minimum value is the one set in the parameter above.


Finally, click Create labels. Optionally, you can set labels to be used for internal Onum routing of data. By default, data will be set as Unlabelled.
Click Create listener when you're done.
Last updated
Was this helpful?