Collect data from Amazon S3
Most recent version: v1.2.0
The Amazon S3 Listener is a Pull Listener and therefore should not be used in environments with more than one cluster.
Overview
Amazon Simple Storage Service is a fully managed object storage service. Users typically use it to store big files at a reasonable cost for long periods of time. In particular, it's commonly used as a data lake storage layer, storing files containing user events with some format/encoding/compression.
Amazon S3 also supports sending notifications to an SQS queue when new files are added to some bucket. You can see a sample notification here.
By leveraging all the above, our S3 Listener is able to react to new files being added to the bucket, get the files, and ingest their events into Onum. All that is needed is an existing SQS queue, an existing S3 bucket, and having the bucket correctly configured to send notifications to the queue.
Prerequisites
Before configuring and starting to send data with the Amazon S3 Listener, you need to take into consideration the following requirements:
Your Amazon user needs at least permission to use the
GetObjectoperation (S3) and theReceiveMessageandDeleteMessageBatchoperations (SQS Bucket) to make this Listener work.Cross-Region Configurations: Ensure that your S3 bucket and SQS queue are in the same AWS Region, as S3 event notifications do not support cross-region targets.
Permissions: Confirm that the AWS Identity and Access Management (IAM) roles associated with your S3 bucket and SQS queue have the necessary permissions.
Object Key Name Filtering: If you use special characters in your prefix or suffix filters for event notifications, ensure they are URL-encoded.
Amazon S3 Setup
You need to configure your Amazon S3 bucket to send notifications to an Amazon Simple Queue Service (SQS) queue when new files are added.
Modify the SQS Queue Policy to Allow S3 to Send Messages
In the Amazon SQS console, select your queue.
Navigate to the Access Policy tab and choose Edit.
Replace the existing policy with the following, ensuring you update the placeholders with your specific details:
{
"Version": "2012-10-17",
"Id": "S3ToSQSPolicy",
"Statement": [
{
"Sid": "AllowS3Bucket",
"Effect": "Allow",
"Principal": {
"Service": "s3.amazonaws.com"
},
"Action": "SQS:SendMessage",
"Resource": "arn:aws:sqs:<region>:<account-id>:<queue-name>",
"Condition": {
"ArnLike": {
"aws:SourceArn": "arn:aws:s3:::<bucket-name>"
},
"StringEquals": {
"aws:SourceAccount": "<account-id>"
}
}
}
]
}Save the changes. This policy grants your S3 bucket permission to send messages to your SQS queue.
Configure S3 Event Notifications
Open the Amazon S3 console and select the bucket you want to configure.
Go to the Properties tab and find the "Event notifications" section.
Click on Create event notification.
Provide a descriptive name for the event notification.
In the Event types section, select All object create events or specify particular events that should trigger notifications.
In the Destination section, choose SQS Queue and select the queue you configured earlier.
Save the configuration.
Onum Setup
Log in to your Onum tenant and click Listeners > New listener.
Double-click the Amazon S3 Listener.
Enter a Name for the new Listener. Optionally, add a Description and some Tags to identify the Listener.
In the Objects section, enter the required Compression method used in the ingested S3 files and Format of the ingested S3 files.
Compression - This accepts the standard compression codecs (gzip, zlib, bzip2), none for no compression, and auto to autodetect the compression type from the file extension.
Format - This currently accepts JSON array (a big JSON array containing a JSON object for each event), JSON lines (a JSON object representing an event on each line), CSV, and auto to autodetect the format from the file extension (.json or .jsonl, respectively).
If you select CSV, more options appear:
Define the Bucket to listen from.
Region* - Find this in your Buckets area, next to the name.

Name - The AWS bucket your data is stored in. This is the bucket name found in your Buckets area. You can fill this if you want to check that notifications come from that bucket, or leave it empty to avoid such checks.
Authentication Type*- Choose manual to enter your access key ID and secret access key manually in the parameters below, or auto to authenticate automatically. The default value is manual.
Access key ID*- Select the access key ID from your Secrets or click New secret to generate a new one.
The Access Key ID is found in the IAM Dashboard of the AWS Management Console.
In the left panel, click on Users.
Select your IAM user.
Under the Security Credentials tab, scroll to Access Keys, and you will find existing Access Key IDs (but not the secret access key).
Secret access key*- Select the secret access key from your Secrets or click New secret to generate a new one. Under Access keys, you can see your Access Key IDs, but AWS will not show the Secret Access Key. You must have it saved somewhere. If you don't have the secret key saved, you need to create a new one.
Proceed with caution when modifying the Bucket advanced options. Default values should be enough in most cases.
Optionally, Amazon S3 provides different types of service endpoints based on the region and access type.
Select your bucket.
Go to the Properties tab.
Under Bucket ARN & URL, find the S3 endpoint URL.
Amazon Service Endpoint will usually be chosen automatically, so you should not normally have to fill this up. However, in case you need to override the default access point, you can do it here.
In the Queue section, choose the region your queue is created in from the dropdown provided.
Then, enter the URL of your existing Amazon SQS queue to send the data to.
Go to the AWS Management Console.
In the Search Bar, type SQS and click on Simple Queue Service (SQS).
Click on Queues in the left panel.
Locate your queue from the list and click it.
The Queue URL will be displayed in the table under URL.

This is the correct URL format: https://sqs.region.localhost/awsaccountnumber/storedinenvvar
Choose your Authentication Type*
Choose manual to enter your access key ID and secret access key manually in the parameters below, or auto to authenticate automatically.
If you have configured your bucket and queue to require different Access Key IDs and Secret Access Keys, enter them here. If these are the same as your bucket, you don't need to repeat them here.
Proceed with caution when modifying the Queue advanced options. Default values should be enough in most cases.
Service endpoint - If you have a custom endpoint, enter it here. The default SQS regional service endpoint will be used by default.
Maximum number of messages* - Set a limit for the maximum number of messages to receive in the notifications queue for each request. The minimum value is
1, and the maximum and default value is10.Visibility timeout* - Set how many seconds to leave a message as hidden in the queue after being delivered, before redelivering it to another consumer if not acknowledged. The minimum value is
30s, and the maximum value is12h. The default value is1h.Wait time*- When the queue is empty, set how long to wait for messages before deeming the request as timed out. The minimum value is
5s, and the maximum and default value is20s.
Proceed with caution when modifying the General advanced options. Default values should be enough in most cases.
Event batch size*- Enter a limit for the number of events allowed through per batch. The minimum value is
1, and the maximum and default value is1000000.Minimum retry time* - Set the minimum amount of time to wait before retrying. The default and minimum value is
1s, and the maximum value is10m.Maximum retry time* - Set the maximum amount of time to wait before retrying. The default value is
5m, and the maximum value is10m. The minimum value is the one set in the parameter above.
Finally, click Create labels. Optionally, you can set labels to be used for internal Onum routing of data. By default, data will be set as Unlabelled.
Learn more about labels in this article.
Click Create listener when you're done.
Last updated
Was this helpful?

