# Rehydrate data from Amazon S3

## Overview

This document describes a method for rehydrating data from Amazon S3 using Falcon Onum.&#x20;

The process can be useful for customers who need to re-ingest data into Falcon NG-SIEM from an S3 source. Many customers reduce or modify the original data using Onum but maintain a copy of the original event in S3 for legal requirements.

The process involves creating several artifacts:

* **S3 Bucket** (typically created by the customer)
* **SQS Queue** - This is necessary because the Onum Listener uses SQS queues to detect events in the S3 bucket
* [**Amazon S3** Listener](https://app.gitbook.com/s/kxZeV4nlXcIAjMGZxzLI/the-workspace/listeners/listener-integrations/collect-data-from-aws-products/collect-data-from-amazon-s3) in Onum
* [**Pipeline**](https://app.gitbook.com/s/kxZeV4nlXcIAjMGZxzLI/the-workspace/pipelines) in Onum
* [**Falcon NG-SIEM** Data Sink](https://app.gitbook.com/s/kxZeV4nlXcIAjMGZxzLI/the-workspace/data-sinks/data-sink-integrations/send-data-to-crowdstrike-products/send-data-to-falcon-next-gen-siem) in Onum
* **Data Connector** in Falcon NG-SIEM

## **Limitations**

At the time of this document's creation, there are some limitations:

* The [**Amazon S3** Listener](https://app.gitbook.com/s/kxZeV4nlXcIAjMGZxzLI/the-workspace/listeners/listener-integrations/collect-data-from-aws-products/collect-data-from-amazon-s3) in Onum only accepts events stored in CSV or JSON format.
* The [**Amazon S3** Listener](https://app.gitbook.com/s/kxZeV4nlXcIAjMGZxzLI/the-workspace/listeners/listener-integrations/collect-data-from-aws-products/collect-data-from-amazon-s3) in Onum cannot be used in environments with more than one deployed cluster.

## **Prerequisites**

Before configuring and starting to send data with the **Amazon S3** Listener, you need to take into consideration the following requirements:

* Your Amazon user needs at least permission to use the `GetObject` operation (S3) and the `ReceiveMessage` and `DeleteMessageBatch` operations (SQS Bucket) to make this Listener work.
* **Cross-Region Configurations**: Ensure that your S3 bucket and SQS queue are in the same AWS Region, as S3 event notifications do not support cross-region targets.
* **Permissions**: Confirm that the AWS Identity and Access Management (IAM) roles associated with your S3 bucket and SQS queue have the necessary permissions.
* **Object Key Name Filtering**: If you use special characters in your prefix or suffix filters for event notifications, ensure they are URL-encoded.

## How it works

The approach used is as follows:

* From the S3 console, the user adds a tag to the events they want to rehydrate. The tag value is not relevant; what is detected by the **Amazon S3** Listener is solely the event generated by the S3 bucket when a tag is added to the content.
* The tag creation generates an event in the S3 bucket that is sent to the SQS queue.
* The **Amazon S3** Listener detects the event and accesses the associated S3 bucket to read the content that has received the tag.
* Onum processes the events through the Pipeline and sends them to the configured **Falcon NG-SIEM** Data Sink.

## **Amazon S3 Setup** <a href="#onumrehydratedatafroms3-amazons3setup" id="onumrehydratedatafroms3-amazons3setup"></a>

You need to configure your Amazon S3 bucket to send notifications to an Amazon Simple Queue Service (SQS) queue when new files are added.

### **Create an Amazon SQS Queue**

{% stepper %}
{% step %}
Sign in to the AWS Management Console and open the Amazon SQS console.
{% endstep %}

{% step %}
Choose **Create Queue** and configure the queue settings as needed.
{% endstep %}

{% step %}
After creating the queue, note its Amazon Resource Name (ARN), which follows this format: `arn:aws:sqs:<region>:<account-id>:<queue-name>`.
{% endstep %}
{% endstepper %}

### **Modify the SQS Queue Policy to Allow S3 to Send Messages**

{% stepper %}
{% step %}
In the Amazon SQS console, select your queue.
{% endstep %}

{% step %}
Navigate to the **Access Policy** tab and choose **Edit**.
{% endstep %}

{% step %}
Replace the existing policy with the following, ensuring you update the placeholders with your specific details:

```json
{
  "Version": "2012-10-17",
  "Id": "S3ToSQSPolicy",
  "Statement": [
    {
      "Sid": "AllowS3Bucket",
      "Effect": "Allow",
      "Principal": {
        "Service": "s3.amazonaws.com"
      },
      "Action": "SQS:SendMessage",
      "Resource": "arn:aws:sqs:<region>:<account-id>:<queue-name>",
      "Condition": {
        "ArnLike": {
          "aws:SourceArn": "arn:aws:s3:::<bucket-name>"
        },
        "StringEquals": {
          "aws:SourceAccount": "<account-id>"
        }
      }
    }
  ]
}
```

{% endstep %}

{% step %}
Save the changes. This policy grants your S3 bucket permission to send messages to your SQS queue.
{% endstep %}
{% endstepper %}

### **Configure S3 Event Notifications**

{% stepper %}
{% step %}
Open the Amazon S3 console and select the bucket you want to configure.
{% endstep %}

{% step %}
Go to the **Properties** tab and find the **Event notifications** section.
{% endstep %}

{% step %}
Click on **Create event notification**.
{% endstep %}

{% step %}
Provide a descriptive name for the event notification.
{% endstep %}

{% step %}
In the **Event types** section, select **Object tagging > Object tags added**.
{% endstep %}

{% step %}
In the **Destination** section, choose **SQS Queue** and select the queue you configured earlier.
{% endstep %}

{% step %}
Save the configuration.
{% endstep %}
{% endstepper %}

## **Falcon NG-SIEM S3 Setup** <a href="#onumrehydratedatafroms3-amazons3setup" id="onumrehydratedatafroms3-amazons3setup"></a>

In Falcon NG-SIEM, users must configure the Data Connection to receive the events correctly. Store the API URL and API Secret.

## **Onum Setup** <a href="#onumrehydratedatafroms3-amazons3setup" id="onumrehydratedatafroms3-amazons3setup"></a>

We will create an **Amazon S3** Listener, a Pipeline, and a **Falcon NG-SIEM** Data Sink&#x20;

### Amazon S3 Listener

Follow the steps in [this article](https://app.gitbook.com/s/kxZeV4nlXcIAjMGZxzLI/the-workspace/listeners/listener-integrations/collect-data-from-aws-products/collect-data-from-amazon-s3).

### Falcon NG-SIEM Data Sink&#x20;

Follow the steps in [this article](https://app.gitbook.com/s/kxZeV4nlXcIAjMGZxzLI/the-workspace/data-sinks/data-sink-integrations/send-data-to-crowdstrike-products/send-data-to-falcon-next-gen-siem). Use the API URL and API Secret generated in Falcon NG-SIEM to fill the **Instance URL** and **Token** fields.

### Pipeline

In this example, the events from the S3 bucket will not be modified; therefore, the Pipeline simply collects the events from the Listener and sends them in their original format to the Data Sink.

Create a new Pipeline, put a name, drag & drop the `label All_Data` from your new Listener and  the Data Sink created in the previous step. Connect them and click the Data Sink to configure the message to be sent.

Click **Publish** and you'll be done.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.onum.com/usecases/reduction/rehydrate-data-from-amazon-s3.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
