Amazon S3

Most recent version: v1.0.0

PreviousData sink Integrations NextAmazon SQS

Last updated 3 days ago

Was this helpful?

Amazon S3

Most recent version: v1.0.0

See the changelog of this Data sink type .

Overview

Amazon S3 is an object storage service that stores and protects any amount of data for a wide range of use cases, including data lakes, websites, cloud-native applications, backups, archives, machine learning, and analytics.

Select Amazon S3 from the list of Data sink types and click Configuration to start.

Data sink configuration

Now you need to specify how and where to send the data, and how to establish a connection with Amazon S3.

Metadata

Enter the basic information for the new Data sink.

Parameters

Description

Name*

Enter a name for the new Data sink.

Description

Optionally, enter a description for the Data sink.

Tags

Add tags to easily identify your Data sink. Hit the Enter key after you define each tag.

Metrics display

Configuration

Now, add the configuration to establish the connection.

AWS

Enter the specific configuration for AWS. You'll find this data in the General purpose buckets area of your Amazon S3 account.

Parameter

Description

Bucket*

Region*

Choose the region the cloud server is found in, also found in your General purpose buckets area, next to the name.

S3 object

S3 objects are files or data sets that are stored in a bucket. Each object is identified by a key that uses prefixes to simulate a folder structure. Click the bucket name to view its Objects and properties. Click an object to open it and see the following parameters.

Parameter

Description

Storage class

Canned ACL

Global prefix

Add a static prefix for all the object keys.

Auth

Only if your Bucket requires authorization.

Parameter

Description

Access key ID*

In the left panel, click on Users.
Select your IAM user.
Under the Security Credentials tab, scroll to Access Keys and you will find existing Access Key IDs (but not the secret access key).

Secret access key*

Under Access keys, you can see your Access Key IDs, but AWS will not show the Secret Access Key. You must have it saved somewhere. If you don't have the secret key saved, you need to create a new one

Advanced options

Parameter

Description

Max object size / Input size

Enter the maximum size of each object (in MB) that is sent to the S3 bucket.

Use Max object size if you select Raw as the format in the output configuration.
Use Input size if you select Parquet as the format in the output configuration.

Instead of partitioning by time, you can partition by the size of the message. Assign here the object's maximum size (in MB). If you do not select a Partition by value, a new object is created upon reaching this limit.

For both options, the minimum value is 1 , and the maximum value is 5243000. The default value is 100.

Custom endpoint

If you have one, enter your custom endpoint.

If your edge services are deployed on-premises, make sure to check your available disk space. This is because setting an Input size greater than the disk space available may lead to technical issues with workers or infrastructure.

Click Finish when complete.

Pipeline configuration

Output configuration

Format

Choose whether the event Format is Raw or Parquet. Depending on the format selected, you'll be prompted to fill in the corresponding parameters:

Parameter

Description

Event field*

This is the name of the input event field.

Framing method*

This parameter defines how events are separated within an S3 object (further defined in the S3 object section of the Data sink). Choose between the various options:

Newline - Uses a newline character ('\n') to separate individual records in the output.
Length - The S3 framing method length is 10 bytes.
No framing - All events are contained in one line, leading to a long line until the maximum size is reached, with only one region.

Compress data?

Choose between true/false to enable/disable compression.

Key format

Choose the format for the name of the objects:

Parameter

Description

Prefix

The prefix used to organize your S3 data.

Partition by

This indicates the frequency with which to generate a new S3 object e.g. every year, month, day hour, minute. If left blank, the value used will be the Max object size / Input size entered in the Data sink configuration.

Click Save to save your configuration.

PreviousData sink Integrations NextAmazon SQS

Last updated 3 days ago

Was this helpful?