# Group By

{% hint style="info" %}
See the changelog of this Action type [here](/actions/group-by.md).
{% endhint %}

## Overview

The **Group By** Action summarizes data by performing aggregations using keys and temporal keys (min, hour, or day).

<figure><img src="/files/sqpkarmbFb3ozGQmLDmV" alt=""><figcaption></figcaption></figure>

In order to configure this Action, you must first link it to a Listener. Go to [Building a Pipeline ](/the-workspace/pipelines/building-a-pipeline.md)to learn how to link.

{% hint style="warning" %}
Note that **Group By** operations run independently and in parallel on each worker. This means their results are based only on the events handled by that specific worker, and thus **Group By** operations do not reflect global results by default.
{% endhint %}

{% hint style="info" %}
**AI Action Assistant**

This Action has an AI-powered chat feature that can help you configure its parameters. Read more about it in [this article](/the-workspace/pipelines/building-a-pipeline/ai-assistant/ai-action-assistant.md).
{% endhint %}

## Ports <a href="#ports" id="ports"></a>

These are the input and output ports of this Action:

<details>

<summary>Input ports</summary>

* **Default port** - All the events to be processed by this Action enter through this port.

</details>

<details>

<summary>Output ports</summary>

* **Default port** - Events are sent through this port if no error occurs while processing them.
* **Error port** - Events are sent through this port if an error occurs while processing them.

</details>

## Configuration

{% stepper %}
{% step %}
Find **Group By** in the **Actions** tab (under the **Aggregation** group) and drag it onto the canvas. Link it to the required [Listener](https://docs.onum.com/the-workspace/listeners) and [Data sink](https://docs.onum.com/the-workspace/pipelines/data-sinks).
{% endstep %}

{% step %}
To open the configuration, click the Action in the canvas and select **Configuration**.
{% endstep %}

{% step %}
Enter the required parameters:

**Grouping configuration**

<table><thead><tr><th width="168">Parameter</th><th>Description</th></tr></thead><tbody><tr><td><strong>Fields to group</strong><mark style="color:red;"><strong>*</strong></mark></td><td>Lists the fields from the linked Listener or Action for you to choose from. Choose one or more fields to group by.</td></tr><tr><td><strong>Grouping time</strong><mark style="color:red;"><strong>*</strong></mark></td><td>Having defined which fields to group by, choose or create a <strong>Grouping time</strong>. You can write the amount and unit (seconds, minutes, hours, days), or select a common amount.</td></tr></tbody></table>

**Aggregations**

<table><thead><tr><th width="168">Parameter</th><th>Description</th></tr></thead><tbody><tr><td><strong>Aggregations</strong><mark style="color:red;"><strong>*</strong></mark></td><td><p>Now you can add aggregation(s) to your grouping using the following operations:</p><ul><li><code>average</code> - calculates the average of the values of each grouping.</li><li><code>collect</code> - collects all the values from rows in each group into a single collection (like an array or list).</li><li><code>collectNotNull</code> - collects all the values from rows in each group into a single collection (like an array or list), excluding <em>null</em> values.</li><li><code>collectDistinct</code> - collects all unique (non-duplicate) values of an expression within each group into a single collection (like an array or list)</li><li><code>collectDistinctNotNull</code> - collects all unique (non-duplicate) values of an expression within each group into a single collection (like an array or list), excluding <em>null</em> values.</li><li><code>count</code> - calculates the total occurrences for each grouping.</li><li><code>countNotNull</code> - calculates the total occurrences for each grouping, excluding <em>null</em> values.</li><li><code>first</code> - finds the first value found for each grouping. The first value will be the first in the workers' queue.</li><li><code>firstNotNull</code> - finds the first not null value found for each grouping. The first value will be the first in the workers' queue.</li><li><code>last</code> - finds the last value found for each grouping. The last value will be the last in the workers' queue.</li><li><code>lastNotNull</code> - finds the last not null value found for each grouping. The last value will be the last in the workers' queue.</li><li><code>max</code> - finds the highest value found.</li><li><code>min</code> - finds the lowest value found.</li><li><code>sum</code> - calculates the total of the values for each grouping.</li></ul><p>To add another aggregation, use the <strong>Add item</strong> option<strong>.</strong></p><p>You can also use the arrow keys on your keyboard to navigate up and down the list.</p></td></tr><tr><td><strong>Conditions</strong></td><td><p>You can also carry out an advanced configuration by Grouping By Conditionals.</p><p>Use the <strong>Add Condition</strong> option to add conditions to your Aggregation.</p></td></tr></tbody></table>
{% endstep %}

{% step %}
Click **Save** to complete.
{% endstep %}
{% endstepper %}

## Example

In this example, we will use the **Group By** action to summarize a large amount of data, grouping by IP address every 5 minutes and aggregate the number of requests by type per IP address.

{% stepper %}
{% step %}

### Raw data

Consider events with the following fields:

* `IP_Address`
* `Request_Type`
* `Timestamp`

```json
[
  {"IP_Address": "192.168.1.1", "Request_Type": "GET", "Timestamp": "2025-01-09T08:00:00Z"},
  {"IP_Address": "192.168.1.2", "Request_Type": "POST", "Timestamp": "2025-01-09T08:05:00Z"},
  {"IP_Address": "192.168.1.1", "Request_Type": "POST", "Timestamp": "2025-01-09T08:10:00Z"},
  {"IP_Address": "192.168.1.3", "Request_Type": "GET", "Timestamp": "2025-01-09T08:15:00Z"},
  {"IP_Address": "192.168.1.2", "Request_Type": "GET", "Timestamp": "2025-01-09T08:20:00Z"},
  {"IP_Address": "192.168.1.1", "Request_Type": "GET", "Timestamp": "2025-01-09T08:25:00Z"},
  {"IP_Address": "192.168.1.3", "Request_Type": "POST", "Timestamp": "2025-01-09T08:30:00Z"}
]
```

{% endstep %}

{% step %}

### Group by

We add the **Group By** Action to the canvas and link it to the incoming data.

**Group** the logs by `IP_Address`over a period of five minutes by selecting the field containing them in **Fields to group** and *five minutes* as the **grouping time.**

<figure><img src="/files/jKHrEYrM3VP7quqh5WNR" alt=""><figcaption></figcaption></figure>
{% endstep %}

{% step %}

### Aggregate

**Aggregate** the number of requests per IP address, broken down by request type (e.g., `GET` vs `POST`).&#x20;

* **Operation**: `count`
* **Field**: `Request_Type`
* **Output field:** `count`

<figure><img src="/files/tkzvzBTeQgNwLt7VlXkI" alt=""><figcaption></figcaption></figure>
{% endstep %}

{% step %}

### Output

The **Group By** Action will emit the following results via the default *output* port:

```json
{
  "aggregated_requests": [
    {
      "IP_Address": "192.168.1.1",
      "GET_Count": 2,
      "POST_Count": 1,
      "Total_Requests": 3
    },
    {
      "IP_Address": "192.168.1.2",
      "GET_Count": 1,
      "POST_Count": 1,
      "Total_Requests": 2
    },
    {
      "IP_Address": "192.168.1.3",
      "GET_Count": 1,
      "POST_Count": 1,
      "Total_Requests": 2
    }
  ]
}
```

You now have one event per grouping and aggregation match.
{% endstep %}
{% endstepper %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.onum.com/the-workspace/pipelines/actions/aggregation/group-by.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
