# Pull data from HTTP endpoints

{% hint style="info" %}
See the changelog of this Listener type [here](/listeners/http-pull-listener.md).
{% endhint %}

{% hint style="warning" %}
Note that this Listener is only available in certain Tenants. [Get in touch with us](/support/support.md) if you don't see it and want to access it.
{% endhint %}

## Overview

Onum supports integration with HTTP Pull. Select **HTTP Pull** from the list of Listener types and click **Configuration** to start.

## Prerequisites

{% hint style="warning" %}
In order to use this Listener, you must activate the following environment variable in your distributor using docker compose:`HTTP_PULL_LISTENER_ENABLED`
{% endhint %}

## HTTP Pull configuration

{% stepper %}
{% step %}
Log in to your Onum tenant and click **Listeners > New listener**.
{% endstep %}

{% step %}
Double-click the **HTTP Pull** Listener.
{% endstep %}

{% step %}
Enter a **Name** for the new Listener. Optionally, add a **Description** and some **Tags** to identify the Listener.
{% endstep %}

{% step %}

### Stopped&#x20;

{% hint style="info" %}
If you do not see this feature, it has not been made available in your Tenant yet.
{% endhint %}

The **Stopped** toggle allows you to set the Listener as inactive in order to stop ingesting data from endpoints whilst you do not need it.
{% endstep %}

{% step %}
Now you need to specify the **Parameters.**

* Enter the **name** of the parameter to search for in the YAML below, used later as `${parameters.name}` e.g. `${parameters.domain}`
* Enter the value or variable to fill in when the given parameter name has been found, e.g. `domain.com`.&#x20;

  With the name set as `domain` and the value set as  `mydomain` , the expression to execute on the YAML would be: `${parameters.domain}`, which will be automatically replaced by the variable. Add as many name/value pairs as required.

**YAML Sample:**

<pre class="language-yaml"><code class="lang-yaml"><strong>  url: "https://${parameters.domain}/api/v2/events/dataexport/alerts/
</strong>  headers:
    — name: Accept
      value: application/json
    — name: Netskope—Api—Token
      value: "${secrets.netskopeApiToken}"
nextRequest:
  method: GET
  url: "https://${parameters.domain}/api/v2/events/dataexport/alerts/
  headers:
    — name: Accept
      value: application/json
    — name: Netskope—Api—Token
      value: "${secrets.netskopeApiToken}"
</code></pre>

{% endstep %}

{% step %}
Next, configure your **Secrets**

* Enter the **name** of the parameter to search for in the YAML below, used later as `${secrets.name}`
* Select the [Secret](#secrets) containing the connection credentials if you have added them previously, or select **New Secret** to add it. This will add this value as a variable when the field name is found in the YAML. Add as many as required.

**YAML Sample:**

```yaml
  method: GET
  url: "https://${parameters.domain}/api/v2/events/data
  headers:
    — name: Accept
      value: application/json
    — name: Netskope—Api—Token
      value: "${secrets.netskopeApiToken}"
nextRequest:
  method: GET
  url: "https://${parameters.domain}/api/v2/events/data
  headers:
    — name: Accept
      value: application/json
    — name: Netskope—Api—Token
      value: "${secrets.netskopeApiToken}"
```

{% endstep %}

{% step %}
Toggle on to configure the HTTP as a **YAML** and paste it here.&#x20;

<figure><picture><source srcset="/files/xeSeQ6kJQwHluAofWY7a" media="(prefers-color-scheme: dark)"><img src="/files/hVpbUz8HMPNt4vMqq4ZC" alt=""></picture><figcaption></figcaption></figure>

The system supports interpolated variables throughout the HTTP request building process using the syntax: `${prefix.name}`

Each building block may:

* Use variables depending on their role (e.g., parameters, secrets, pagination state).
* Expose variables for later phases (e.g., pagination counters, temporal window bounds).

{% hint style="warning" %}
Not all variable types are available in every phase. Each block has access to a specific subset of variables.
{% endhint %}

Variables can be defined in the configuration or generated dynamically during execution. Each variable has a prefix that determines its source and scope.

These are the supported prefixes:

* **Parameters** - User-defined values configured manually. Available in all phases.
* **Secrets** - Sensitive values such as credentials or tokens. Available in all phases.
* **temporalWindow** - Automatically generated from the Temporal Window block. Available in the Enumeration and Collection phases.
* **Pagination** - Values produced by the pagination mechanism (e.g., offset, cursor). Available in the Enumeration and Collection phases.
* **Inputs** - Values derived from the output of the Enumeration phase. Available only in the Collection phase.

If you do not have a YAML to paste, see how to manually configure the various components of a YAML in the following sections.
{% endstep %}
{% endstepper %}

## Desconstructing a YAML

Here we will learn what each parameter of the YAML means, and how they correspond to the settings in the HTTP Pull Listener.

The YAML is used for pulling alerts via an API and typically uses

* A **Temporal Window** to enable the use of a time-based query window for filtering results.
* **Authentication** using a token to authenticate the connection.
* The first phase **(Enumeration)** enables an initial listing phase to get identifiers (e.g., alert IDs), paginating through the results.
* The second phase **(Collection)** then fetches full alert details using the alert IDs from the enumeration phase.
* Standard JSON response mapping is used to output the results.

{% hint style="info" %}
Only the Collection phase is mandatory, the rest of the fields are optional.
{% endhint %}

Let´s take a closer look at each phase below.

***

### Temporal window

A temporal window is a defined time range used to filter or limit data retrieval in queries or API requests. It specifies the start and end time for the data you want to collect or analyze. This YAML uses a temporal window of 5 minutes, in RFC3339 format, with an offset of 10, in UTC timezone.

<table><thead><tr><th width="179.99609375">Parameter</th><th>Description</th><th data-hidden></th></tr></thead><tbody><tr><td><strong>Duration</strong><mark style="color:red;"><strong>*</strong></mark></td><td>Add the duration that the window will remain open for.</td><td></td></tr><tr><td><strong>Offset</strong><mark style="color:red;"><strong>*</strong></mark></td><td>How far back from the current time the window starts.</td><td></td></tr><tr><td><strong>Time Zone</strong><mark style="color:red;"><strong>*</strong></mark></td><td>This value is usually automatically set to your current time zone. If not, select it here.</td><td></td></tr><tr><td><strong>Format</strong><mark style="color:red;"><strong>*</strong></mark></td><td>Choose between <em>Epoch</em> or <em>RCF3339</em> for the timestamp format.</td><td></td></tr></tbody></table>

<details>

<summary>Temporal Window example</summary>

```yaml
withTemporalWindow: true
temporalWindow:
  duration: 5m
  offset: 10
  tz: UTC
  format: RFC3339
```

In Onum, toggle **ON** the Temporal Window selector and enter the information in the corresponding fields

* **Duration**<mark style="color:red;">**\***</mark>**&#x20;-** 5m
* **Offset**<mark style="color:red;">**\***</mark>**&#x20;-** 10
* **TZ**<mark style="color:red;">**\***</mark>**&#x20;-** this will set automatically according to your current timezone.
* **Format**<mark style="color:red;">**\***</mark>**&#x20;-** RFC3339

So if the current UTC is 12:00, the range would be 11:50 - 11:55.

</details>

***

### Authentication phase

If your connection requires authentication, enter the credentials here.&#x20;

<table><thead><tr><th width="179.74609375">Parameter</th><th>Description</th></tr></thead><tbody><tr><td><strong>Authentication Type</strong><mark style="color:red;"><strong>*</strong></mark></td><td>Choose the authentication type and enter the details.</td></tr></tbody></table>

#### Authentication credentials

The options provided will vary depending on the type chosen to authenticate your API. This is the type you have selected in the API end, so it can recognize the request.

Choose between the options below.

<details>

<summary>Basic</summary>

* **Username**<mark style="color:red;">**\***</mark>**&#x20;-** the user sending the request.
* **Password**<mark style="color:red;">**\***</mark>**&#x20;-** the password eg: `${secrets.password}`

```yaml
withAuthentication: true
authentication:
  type: basic
  basic:
    username: testuser
    password: testpass
```

</details>

<details>

<summary>API Key</summary>

Enter the following:

* **API Key -** API keys are usually stored in developer portals, cloud dashboards, or authentication settings. Set the a secret, eg: `${secrets.api_key}`
* **Auth injection:**
  * **In**<mark style="color:red;">**\***</mark>**&#x20;-** Enter the incoming format of the API: Header or Query.
  * **Name**<mark style="color:red;">**\***</mark>**&#x20;-** The header name or parameter name where the api key will be sent.
  * **Prefix -** Enter a prefix if required.
  * **Suffix -** Enter a suffix if required.

```yaml
withAuthentication: true
authentication:
 type: apiKey
  apiKey:
    apiKey: test-api-key
    authInjection:
      name: X-API-Key
      in: header
      prefix: "Bearer"
```

</details>

<details>

<summary>Token</summary>

**Token Retrieve Based Authentication**

* **Request -**&#x20;
  * **Method**<mark style="color:red;">**\***</mark>**&#x20;-** Choose between *GET* or *POST*
  * **URL**<mark style="color:red;">**\***</mark>**-** Enter the URL to send the request to.
* **Headers -** Add as many headers as required.
  * **Name**
  * **Value**
* **Query Params -** Add as many query parameters as required.
  * **Name**
  * **Value**
* **Token Path**<mark style="color:red;">**\***</mark>**&#x20;-** Enter your **Token Path** for used to retrieve an authentication token.
* **Auth injection:**
  * **In**<mark style="color:red;">**\***</mark>**&#x20;-** Enter the incoming format of the API: Header or Query.
  * **Name**<mark style="color:red;">**\***</mark>**&#x20;-** A **label** assigned to the API key for identification. You can find it depending on where the API key was created.
  * **Prefix -** Enter a connection prefix if required.
  * **Suffix -** Enter a connection suffix if required.

```yaml
withAuthentication: true
authentication:
  type: token
  token:
    request:
      method: POST
      url: ${parameters.domain}/oauth2/token
      headers:
        - name: Content-Type
          value: application/x-www-form-urlencoded
      bodyType: urlEncoded
      bodyParams:
        - name: grant_type
          value: client_credentials
        - name: client_id
          value: '${secrets.client_id}'
        - name: client_secret
          value: '${secrets.client_secret}'
    tokenPath: ".access_token"
    authInjection:
      in: header
      name: Authorization
      prefix: 'Bearer '
      suffix: ''
```

#### Example

* **Type** - Token. Token authentication is a method of authenticating API requests by using a secure token, usually passed in an HTTP header.
* **Request**
  * **method** - `POST`Sends a POST request to obtain an access token.
  * **url** - `${parameters.domain}/oauth2/token`The OAuth token endpoint. `${parameters.domain}` is a placeholder for value entered in the **Parameters** section.
  * **headers** - these headers are key-value pairs that provide additional information to the server when making a request.
    * **name** - `Content-Type`
    * **value** - `application/x-www-form-urlencoded`Indicates that the request body is formatted as URL-encoded key-value pairs (standard for OAuth token requests).
  * **Body type** -`urlEncoded` Specifies the request body format is URL-encoded (like `key=value&key2=value2`).
    * **Body params**
      * **name** - `grant_type` Required by OAuth 2.0 to specify the type of grant being requested.
      * **value** - `client_credentials` Used for server-to-server authentication without a user.
      * **name** - client\_ID
      * **value** - `${secrets.client_id}`this is a dynamic variable pulled from the value entered in the **Secrets** setting.
      * **name** - client\_secret
      * **value** - `${secrets.client_secret}` this is a dynamic variable pulled from the value entered in the **Secrets** setting.
  * **Token path -** Extracts the access token from the JSON response of an authentication request. It's a JSONPath-like expression used to locate the token in the response body.

<figure><picture><source srcset="/files/dg1rmgGRtVZLZHN6ejE9" media="(prefers-color-scheme: dark)"><img src="/files/fZbLacJCZYxyQCYoHh0z" alt=""></picture><figcaption></figcaption></figure>

Toggle **ON** the Authentication option.

* **Auth injection** - This part defines how and where to inject the authentication token (typically an access token) into the requests after it has been retrieved, for example, from an OAuth token endpoint.
  * **in** -`header`The token should be injected into the HTTP header of the request.This is the most common method for passing authentication tokens.
  * **Name -**`Authorization`The name of the header that will contain the token. Most APIs expect this to be `Authorization`.
  * **prefix** - The text added before the token value.`Bearer` is the standard prefix for OAuth 2.0 tokens.
  * **suffix** -`''`Text added after the token value. In this case, it's empty — nothing is appended.

</details>

<details>

<summary>HMAC</summary>

Signs the queries using a secret key that is used by the server to authenticate and validate integrity.

**Token Retrieve Based Authentication**

**Request**

* **Generate ID** - Toggle **ON** to generate.
* **Generate Timestamp**
  * **Timezone**<mark style="color:red;">**\***</mark>**&#x20;-** this field is automatically-filled using your current timezone.&#x20;
  * **Format**<mark style="color:red;">**\***</mark>**&#x20;-** the format for the timestamp syntax (Seconds, Epoch, Epoch Timestamp, RFC1123, RFC1123Z, RFC3339 or custom). Selecting **custom** opens the Go time format option, where you can write your custom syntax e.g. `2 Jan 2006 15:04:05`
* **Generate content hash**&#x20;
  * **Content hash**
    * **Hashing algorithm**<mark style="color:red;">**\***</mark> - select the [hash](/the-workspace/pipelines/actions/transformation/field-transformation/field-transformation-operations/hashing.md) operation to carry out on the content.
    * **Encoding**<mark style="color:red;">**\***</mark> - choose the encoding method.
  * **Hashing**
    * **Hashing algorithm**<mark style="color:red;">**\***</mark> - select the [hash](/the-workspace/pipelines/actions/transformation/field-transformation/field-transformation-operations/hashing.md) operation to carry out on the content.
    * **Encoding**<mark style="color:red;">**\***</mark> - choose the encoding method.
    * **Secret key**<mark style="color:red;">**\***</mark> - how to generate the string that will be signed.
    * **Data to sign**<mark style="color:red;">**\***</mark> - e.g. `"${request.method}\n${request.contentHash}\napplication/json\n${request.relativeUrl}\n${request.timestamp}"`
* **Headers** to be added to the request (name & value).

```yaml
withAuthentication: true
authentication:
  type: hmac
  hmac:
    request:
      generateTimestamp: true
      timestamp:
        tz: UTC
        format: EpochMillis
    hash:
      secretKey: ${secrets.apiSecret}
      algorithm: hmac_sha256
      encoding: hex
      dataToSign: "${secrets.apiKey}${request.body}${request.timestamp}"
    headers:
      x-logtrust-apikey: ${secrets.apiKey}
      x-logtrust-timestamp: ${request.timestamp}
      x-logtrust-sign: ${hmac.hash}
```

#### Example: Authenticate HTTP requests to Microsoft Azure using the HMAC-SHA256 scheme.

[Learn how to calculate the HMAC for this API here.](https://learn.microsoft.com/es-es/azure/azure-app-configuration/rest-api-authentication-hmac)

```yaml
withAuthentication: true
authentication:
  type: hmac
  hmac:
    request:
      generateTimestamp: true
      timestamp:
        tz: UTC
        format: RFC1123
      generateContentHash: true
      contentHash:
        algorithm: sha256
        encoding: base64
    hash:
      algorithm: hmac_sha256
      encoding: base64
      secretKey: ${secrets.secretKey}
      dataToSign: "${request.method}\n${request.relativeUrl}\n${request.timestamp};${request.host};${request.contentHash}"
    headers:
      - name: x-ms-date
        value: ${request.timestamp}
      - name: x-ms-content-sha256
        value: ${request.contentHash}
      - name: Authorization
        value: "HMAC-SHA256 Credential=${secrets.accessKeyId}&SignedHeaders=x-ms-date;host;x-ms-content-sha256&Signature=${hmac.hash}"
```

* **Type** - HMAC.

**Request Parameters**&#x20;

* **Generate Timestamp**
  * **Timezone -** UTC
  * **Format -** RFC1123
* **Generate Content Hash**
  * **Algorithm** - sha256
  * **Encoding** - base64

<figure><picture><source srcset="/files/xyggE3Ap14odtn5IGx0Z" media="(prefers-color-scheme: dark)"><img src="/files/Qdvty66DWGGOIm862Y7q" alt=""></picture><figcaption></figcaption></figure>

**Hash**

Base64-encoded HMACSHA256 of the *String-To-Sign*.&#x20;

* **Algorithm** - hmac\_sha256
* **Encoding** - base64
* **Secret** **Key** - `${secrets.secretKey}` This variable is retrieved from the **secrets** parameter.
* **Data To Sign** -  A canonical representation of the request with the format **HTTP\_METHOD** + '\n' + **path\_and\_query** + '\n' + **signed\_headers\_values** `${request.method}\n${request.relativeUrl}\n${request.timestamp};${request.host};${request.contentHash}`

<figure><picture><source srcset="/files/wCjFOba8WogvTqdz5NQa" media="(prefers-color-scheme: dark)"><img src="/files/EUb8sT1y75FkrI5Uznij" alt=""></picture><figcaption></figcaption></figure>

**Headers**

* **Name** - `x-ms-date` can be used when the agent cannot directly access the `Date` request header or when a proxy modifies it. If both `x-ms-date` and `Date` are provided, `x-ms-date` takes precedence.
* **Value** - `${request.timestamp}`
* **Name** - `x-ms-content-sha256` Base64-encoded SHA256 hash of the request body. It must be provided even if there is no body.
* **Value** - `${request.contentHash}`
* **Name** - `Authorization` Required by the HMAC-SHA256 scheme.
* **Value** - `HMAC-SHA256 Credential=${secrets.accessKeyId}&SignedHeaders=x-ms-date;host;x-ms-content-sha256&Signature=${hmac.hash}`

<figure><picture><source srcset="/files/5k0I1ZNDFKIpYKsTJIRJ" media="(prefers-color-scheme: dark)"><img src="/files/cQ4Vc7hu5BmE7FGHP4w3" alt=""></picture><figcaption></figcaption></figure>

#### Example 2: API HMAC Authentication for Oracle

[See here for how to calculate the API HMAC in Oracle](https://docs.oracle.com/en/cloud/saas/marketing/crowdtwist-develop/Developers/HMACAuthentication.html).

* **Type** - HMAC.

**Request Parameters**&#x20;

* **Generate ID**&#x20;
  * **Type -** uuid
* **Generate Timestamp**
  * **Timezone -** UTC
  * **Format -** Epoch
* **Generate Content Hash**
  * **Algorithm** - sha1
  * **Encoding** - `base64` - The binary hash result will be encoded in **Base64** for transmission.

**Hash**

Base64-encoded HMACSHA256 of the *String-To-Sign*.&#x20;

* **Algorithm** - `hmac_sha256`
* **Encoding** - `base64`&#x20;
* **Secret** **Key** - `${secrets.secretKey}` This variable is retrieved from the **secrets** parameter.
* **Data To Sign** -  `${request.method}\n${request.contentHash}\napplication/json${request.timestamp}\n${request.relativeUrl}`This is the canonical **string-to-sign**:
  * `${request.method}` - HTTP method (e.g., `GET`, `POST`)
  * `${request.contentHash}` - Base64 SHA-1 hash of the request body
  * `"application/json"` - Hardcoded content type
  * `${request.timestamp}` - Epoch UTC timestamp
  * `${request.relativeUrl}` - The relative path and query string

    The `\n` means each element is separated by a **newline**.

**Headers**

* **Name** - `ct-authorization`&#x20;
* **Value** - `CTApiV2Auth ${parameters.publicKey}:${hmac.hash}`
  * `CTApiV2Auth` - Authentication scheme name.
  * `${parameters.publicKey}` - Public key or access ID.
  * `${hmac.hash}` - The generated HMAC-SHA256 signature from the **hash** section.
* **Name** - `ct-timestamp`
* **Value** - `${request.timestamp}` the same Epoch UTC timestamp generated earlier.

```yaml
withAuthentication: true
authentication:
  type: hmac
  hmac:
    request:
      generateId: true
      idType: uuid
      generateTimestamp: true
      timestamp:
        tz: UTC
        format: Epoch
      generateContentHash: true
      contentHash:
        algorithm: sha1
        encoding: base64
    hash:
      algorithm: hmac_sha256
      encoding: base64
      secretKey: ${secrets.secretKey}
      dataToSign: "${request.method}\n${request.contentHash}\napplication/json${request.timestamp}\n${request.relativeUrl}"
    headers:
      - name: x-ct-authorization
        value: CTApiV2Auth ${parameters.publicKey}:${hmac.hash}
      - name: x-ct-timestamp
        value: ${request.timestamp}
```

<figure><picture><source srcset="/files/mfy4K8una3vP0aMeTPth" media="(prefers-color-scheme: dark)"><img src="/files/kNNxrwiVy2eiGSUUAZzB" alt=""></picture><figcaption></figcaption></figure>

</details>

<details>

<summary>Akamai EdgeGrid</summary>

Authenticates using Akamai EdgeGrid endpoints.

**Akamai EdgeGrid Authentication**

**Request**

* **Client Token**<mark style="color:red;">**\***</mark>&#x20;
* **Access Token**<mark style="color:red;">**\***</mark>&#x20;
* **Client Secret&#x20;**<mark style="color:red;">**\***</mark>&#x20;

1. Log in to the [Akamai Control Center](https://control.akamai.com/)
2. Navigate to **Identity & Access Management > API Users**
3. Either select an existing API user or create a new one
4. Click **Create API Client** or view existing credentials
5. In the credentials view, you'll find the **Client Token** along with the **Client Secret** and **Access Token**

**Advanced Configuration**

* **Maximum body size in bytes or empty for whole body -** This configuration parameter defines the maximum allowed size (in bytes) for request bodies sent to the Akamai authentication endpoint. When set to a specific byte value, it limits the amount of data that will be processed, preventing potential resource exhaustion from oversized payloads. When left empty, the system will process the entire body regardless of size
* **Headers added to the signature -** HTTP header fields that are incorporated into the cryptographic signature calculation for request verification. These headers become part of the signed content that Akamai uses to validate the authenticity and integrity of the request.

```yaml
withAuthentication: true
authentication:
  type: akamai
  akamai:
    clientSecret: ${secrets.clientSecret}
    accessToken: ${secrets.accessToken}
    clientToken: ${secrets.clientToken}
```

</details>

<details>

<summary>Bloodhound</summary>

Authenticates using [BloodHound API.](https://bloodhound.specterops.io/integrations/bloodhound-api/working-with-api)

* **Token ID**<mark style="color:red;">**\***</mark>&#x20;
  * A unique identifier (usually a UUID/GUID) visible in the interface even after token creation e.g. `tkn_12a34567-89b0-12c3-d456-789012ef3456`
* **Token Key**<mark style="color:red;">**\***</mark>&#x20;
  * Log in to the BloodHound CE or Enterprise web interface
  * Navigate to **Settings** or **User Profile** (typically accessible from the top-right user menu)
  * Select **API Tokens** or **Access Tokens**
  * Either view existing tokens or create a new one by clicking **Generate Token**
  * Provide a name/description for the token and set appropriate permissions
  * After creation, the Token Key will be displayed once (copy it immediately as it won't be shown again)

```yaml
withAuthentication: true
authentication:
  type: bloodHound
  bloodHound:
    tokenId: ${secrets.tokenId}
    tokenKey: ${secrets.tokenKey}
```

</details>

***

### Retry

Toggle **ON** to allow for retries and to configure the specifics.

<table><thead><tr><th width="179.99609375">Parameter</th><th>Description</th><th data-hidden></th></tr></thead><tbody><tr><td><strong>Retry Type</strong><mark style="color:red;"><strong>*</strong></mark></td><td><ul><li><p><strong>Fixed</strong> - Retries the failed operation after a constant, fixed interval every time e.g. the same amount of time between each retry attempt</p><ul><li><strong>Interval</strong><mark style="color:red;"><strong>*</strong></mark> - enter the amount of time to wait e.g. 5s.</li></ul></li><li><p><strong>Exponential</strong> - Retries the failed operation after increasingly longer intervals to avoid overwhelming the service. The delay grows with each retry attempt. </p><ul><li><strong>Initial delay</strong><mark style="color:red;"><strong>*</strong></mark> - The starting delay before the first retry attempt to ensure there’s at least some delay before retrying to avoid immediate re-hits. For example, an initial delay of <code>2s</code> equals a retry pattern of <code>2s</code>, <code>4s</code>, <code>8s</code>, <code>16s</code>, etc.</li><li><strong>Maximum delay</strong><mark style="color:red;"><strong>*</strong></mark> - The maximum wait time allowed between retries to prevent the retry delay from growing indefinitely. For example, an initial delay of <code>2s</code> and a maximum delay of <code>10s</code> equals a delay progression of <code>2s</code>, <code>4s</code>, <code>8s</code>, <code>10s</code>, <code>10s</code>, etc.</li><li><strong>Increasing factor</strong><mark style="color:red;"><strong>*</strong></mark> - The multiplier used to calculate the next delay interval, determining how quickly the delay grows after each failed attempt.</li></ul></li></ul></td><td></td></tr><tr><td><strong>Retry after response header</strong></td><td><p>Used to define how long to wait before making another request e.g. <code>HTTP 429 Too Many Requests</code> or <code>HTTP 503 Service Unavailable</code>. </p><ul><li><strong>Header</strong> - Follow the header syntax for the header.</li><li><p><strong>Format</strong> - The format for the header syntax (Seconds, Epoch, Epoch Timestamp, RFC1123, RFC1123Z, RFC3339). </p><ul><li>e.g. wait 120 seconds <code>Retry-After: 120</code></li><li>e.g. epoch timestamp <code>Retry-After: Wed, 21 Oct 2025 07:28:00 GMT</code></li></ul></li></ul></td><td></td></tr></tbody></table>

### Retry on errors processing body

Toggle **ON** to allow retry on body failures. When a response cannot be parsed, it will be retried the number of times specified in the **maximum number of retries** field.

### Throttling

Use throttling to intentionally limit the rate at which the HTTP requests are sent to the API or service.

**Throttling Type**<mark style="color:red;">**\***</mark>

{% tabs %}
{% tab title="Client" %}
The client itself controls and limits the rate at which it sends requests.&#x20;

<table><thead><tr><th width="160.05078125">Parameter</th><th>Description</th></tr></thead><tbody><tr><td><strong>Client type</strong><mark style="color:red;"><strong>*</strong></mark></td><td><p>How to manage the rate of requests. </p><ul><li><p><strong>Rate -</strong>  the client is restricted by the data transfer rate or request rate over time. </p><ul><li><strong>Maximum requests</strong><mark style="color:red;"><strong>*</strong></mark> - The maximum number of requests (or amount of data) to make within a specified time interval.</li><li><strong>Call interval</strong><mark style="color:red;"><strong>*</strong></mark> - The sliding or fixed window of time used to calculate the rate.</li><li><strong>Number of burst requests</strong><mark style="color:red;"><strong>*</strong></mark> - the number of requests that can exceed the normal rate temporarily before throttling kicks in to allow short bursts of traffic over the limit to accommodate sudden spikes without immediate blocking. e.g. if the max rate is 10 requests/sec, and burst is 5, the client could make up to 15 requests instantly, but then throttling will slow down after the burst.</li></ul></li><li><p><strong>Fixed delay</strong> - The server enforces a fixed wait time after each request before allowing the client to make the next request. Instead of limiting by rate (requests per second) or volume, it just inserts a pause/delay between requests.</p><ul><li><strong>Call interval</strong><mark style="color:red;"><strong>*</strong></mark> - The sliding or fixed window of time used to calculate the delay.</li></ul></li></ul></td></tr></tbody></table>

**Example**

```yaml
withThrottling: true
throttling:
  type: client
  client:
    type: rate
    rate:
      maxRequests: 42
      interval: 1s
      burst: 123
```

{% endtab %}

{% tab title="Server" %}
The server controls the rate at which it sends data.

<table><thead><tr><th width="160.0859375">Parameter</th><th>Description</th></tr></thead><tbody><tr><td><strong>Wait response header</strong><mark style="color:red;"><strong>*</strong></mark></td><td><p>These headers inform how long to wait before retrying and how many requests remaining.</p><ul><li><strong>Header Type</strong><mark style="color:red;"><strong>*</strong></mark><strong> -</strong> Enter the header to instruct that to do e.g. <code>wait</code>., <code>Retry-After</code>, etc.</li><li><strong>Format</strong> - The format for the header syntax (Seconds, Epoch, Epoch Timestamp, RFC1123, RFC1123Z, RFC3339). e.g. wait 120 seconds <code>Retry-After: 120</code> e.g. epoch timestamp <code>Retry-After: Wed, 21 Oct 2025 07:28:00 GMT</code></li></ul></td></tr><tr><td><strong>Reset response header</strong></td><td><p>Indicates when a rate limit or throttle window resets, allowing the client to resume normal activity (e.g., making more requests or pulling more data). </p><ul><li><strong>Header Type</strong><mark style="color:red;"><strong>*</strong></mark><strong> -</strong> Enter the header to instruct that to do e.g. <code>wait</code>., <code>Retry-After</code>, etc.</li><li><strong>Format</strong> - The format for the header syntax (Seconds, Epoch, Epoch Timestamp, RFC1123, RFC1123Z, RFC3339). </li></ul></td></tr><tr><td><strong>Remaining response header</strong><mark style="color:red;"><strong>*</strong></mark></td><td>How many requests or units of usage the puller can still make within the current time window before hitting the limit and being throttled.</td></tr></tbody></table>

**Example**

```yaml
withThrottling: true
throttling:
  type: server
  server:
    waitResponseHeader:
      name: Retry-After
      format: seconds
```

{% endtab %}
{% endtabs %}

***

### Enumeration phase

The enumeration phase is an optional step in data collection or API integration workflows, where the system first retrieves a list of available items (IDs, resource names, keys, etc.) before fetching detailed data about each one.

Identify the available endpoints, methods, parameters, and resources exposed by the API. This performs initial data discovery to feed the collection phase and makes the results available to the Collection Phase via variable interpolation (inputs.\*).

Can use:

* `${parameters.xxx}`
* `${secrets.xxx}`
* `${temporalWindow.xxx}` (if configured)
* `${pagination.xxx}` Pagination variables

<table><thead><tr><th width="179.99609375">Parameter</th><th>Description</th><th data-hidden></th></tr></thead><tbody><tr><td><strong>Pagination Type</strong><mark style="color:red;"><strong>*</strong></mark></td><td><p>Select one from the drop-down. <strong>Pagination type</strong> is the method used to split and deliver large datasets in smaller, manageable parts (pages), and how those pages can be navigated during discovery. </p><p>Each pagination method manages its own state and exposes specific variables that can be interpolated in request definitions (e.g., URL, headers, query params, body).</p><p><strong>None</strong></p><ul><li><p>Description: No pagination; only a single request is issued.</p><ul><li>Repeat until: <em><strong>No repeat</strong></em> to ignore, or <em><strong>No data</strong></em> to repeat the request until no data is returned. </li></ul></li></ul><p><strong>PageNumber/PageSize</strong></p><ul><li>Description: Pages are indexed using a page number and fixed size.</li><li><p>Configuration: </p><ul><li>pageSize: page size</li></ul></li><li><p>Exposed Variables:</p><ul><li>${pagination.pageNumber}</li><li>${pagination.pageSize}</li></ul></li></ul><p><strong>Offset/Limit</strong></p><ul><li>Description: Uses offset and limit to fetch pages of data.</li><li><p>Configuration: </p><ul><li>Limit: max quantity of records per request</li></ul></li><li><p>Exposed Variables:</p><ul><li>${pagination.offset}</li><li>${pagination.limit}</li></ul></li></ul><p><strong>From/To</strong></p><ul><li>Description: Performs pagination by increasing a window using from and to values.</li><li>Configuration: limit: max quantity of records per request</li><li><p>Exposed Variables:</p><ul><li>${pagination.from}</li><li>${pagination.to}</li></ul></li></ul><p><strong>Web Linking (RFC 5988)</strong></p><ul><li>Description: Parses the Link header to find the rel="next" URL.</li><li>Exposed Variables: None</li></ul><p><strong>Next Link at Response Header</strong></p><ul><li>Description: Follows a link found in a response header.</li><li><p>Configuration: </p><ul><li>headerName: header name that contains the next link</li></ul></li><li>Exposed Variables: None</li></ul><p><strong>Next Link at Response Body</strong> </p><ul><li>Description: Follows a link found in the response body.</li><li><p>Configuration: </p><ul><li>nextLinkSelector: path to next link sent in response payload</li></ul></li><li>Exposed Variables: None</li></ul><p><strong>Cursor</strong></p><ul><li>Description: Extracts a cursor value from each response to request the next page.</li><li><p>Configuration: </p><ul><li>cursorSelector: path to the cursor sent in response payload</li></ul></li><li><p>Exposed Variables:</p><ul><li>${pagination.cursor}</li></ul></li></ul></td><td></td></tr></tbody></table>

**Output**

<table><thead><tr><th width="179.74609375">Parameter</th><th>Description</th></tr></thead><tbody><tr><td><strong>Select</strong><mark style="color:red;"><strong>*</strong></mark></td><td>If your connection does not require authentication, leave as <strong>None.</strong> Otherwise, choose the authentication type and enter the details. A JSON selector expression to pick a part of the response e.g. '.data'.</td></tr><tr><td><strong>Filter</strong></td><td>A JSON expression to filter the selected elements. Example: <code>'.films | index("Tangled")'</code>.</td></tr><tr><td><strong>Map</strong></td><td>A JSON expression to transform each selected element into a new event.<br>Example: <code>'{characterName: .name}'</code>.</td></tr><tr><td><strong>Output Mode</strong><mark style="color:red;"><strong>*</strong></mark></td><td><p>Choose between </p><ul><li><strong>Element</strong>: emits each transformed element individually as an event.</li><li><strong>Collection</strong>: emits all transformed items as a single array/collection as an event.</li></ul></td></tr></tbody></table>

<details>

<summary>Enumeration example</summary>

<pre class="language-yaml"><code class="lang-yaml"><strong>enumerationPhase:
</strong>  paginationType: offsetLimit
  limit: 100
  request:
    responseType: json
    method: GET
    url: ${parameters.domain}/alerts/queries/alerts/v2
    queryParams:
      - name: offset
        value: ${pagination.offset}
      - name: limit
        value: ${pagination.limit}
      - name: filter
        value: created_timestamp:>'${temporalWindow.from}'+created_timestamp:&#x3C;'${temporalWindow.to}'
  output:
    select: ".resources"
    map: "."
    outputMode: collection
</code></pre>

* **Pagination type** - `offset/Limit`Uses classic pagination with **`offset`** and **`limit`** to page through results, fetching data in batches (pages) — `limit` determines page size, `offset` determines where to start.
* **Limit** - Retrieves up to **100 records per request.** This value is used in the `limit` query parameter to control batch size.
* **Request** - Describes the API request that will be sent during enumeration.
  * **Response type** - Specifies the expected response format. Here, the system expects a `JSON` response.
  * **Method** - The HTTP method to use for this request. `GET` is used to retrieve data from the server.&#x20;
  * **URL** - `${parameters.domain}` is a placeholder variable that will be replaced by the domain value you entered in the **Parameters** section.

<figure><picture><source srcset="/files/VVp7QUozQiLemteNXgm0" media="(prefers-color-scheme: dark)"><img src="/files/mcOPS8TgMfzm2YDEiqYy" alt=""></picture><figcaption></figcaption></figure>

**Query params** - These are query string parameters appended to the URL.

* `${pagination.offset}`controls where to start in the dataset. Used for pagination.
* `${pagination.limit}`replaced with the limit value you entered for number of records to retrieve per request (100).
* Filters data to only return alerts created within a specific time window. `${temporalWindow.from}` and `${temporalWindow.to}` are dynamically filled in with RFC3339 or epoch timestamps, depending what you have configured.

<figure><picture><source srcset="/files/nhyUseZPAA4Xm70h0Iri" media="(prefers-color-scheme: dark)"><img src="/files/wJBWvDBBRkDjuVFU9ICb" alt=""></picture><figcaption></figcaption></figure>

**output** - Describes how to extract and interpret the results from the JSON response.

* **select -** `.resources`Looks for a field named `resources` in the response JSON. This is where the array of items lives.
* **map - `.`**&#x45;ach item under `.resources` is returned as-is. No transformation or remapping.
* **outputMode - `collection`**&#x54;he result is treated as a collection (array) of individual items. Used when you expect multiple items and want to pass them along for further processing.

</details>

***

### Collection phase

The collection phase in an HTTP Puller is the part of the process where the system actively pulls or retrieves data from an external API using HTTP requests.

The collection phase is mandatory. This is where the final data retrieval happens (either directly or using IDs/resources generated by an enumeration phase).

The **collection phase** involves gathering actual data from an API after the enumeration phase has mapped out endpoints, parameters, and authentication methods. It supports dynamic variable resolution via the variable resolver and can use data exported from the Enumeration Phase, such as:

* `${parameters.xxx}`
* `${secrets.xxx}`
* `${temporalWindow.xxx`}
* `${inputs.xxx}` (from Enumeration Phase)
* `${pagination.xxx}*`

**Inputs**

In collection phases, you can define variables to be used elsewhere in the configuration (for example, in URLs, query parameters, or request bodies). Each variable definition has the following fields:

<table><thead><tr><th width="211.69140625">Parameter</th><th>Description</th><th data-hidden></th></tr></thead><tbody><tr><td><strong>Name</strong></td><td>The variable name (used later as <code>${inputs.name}</code> in the configuration).</td><td></td></tr><tr><td><strong>Source</strong></td><td>Usually "input", indicating the value comes from the enumeration phase’s output.</td><td></td></tr><tr><td><strong>Expression</strong></td><td>A JSON expression applied to the input to extract or transform the needed value.</td><td></td></tr><tr><td><strong>Format</strong></td><td>Controls how the variable is converted to a string (see Variable Formatting below). Eg: json.</td><td></td></tr></tbody></table>

### Retry

Toggle **ON** to allow for retries and to configure the specifics.

<table><thead><tr><th width="179.99609375">Parameter</th><th>Description</th><th data-hidden></th></tr></thead><tbody><tr><td><strong>Retry Type</strong><mark style="color:red;"><strong>*</strong></mark></td><td><ul><li><p><strong>Fixed</strong> - Retries the failed operation after a constant, fixed interval every time e.g. the same amount of time between each retry attempt</p><ul><li><strong>Interval</strong><mark style="color:red;"><strong>*</strong></mark> - enter the amount of time to wait e.g. 5s.</li></ul></li><li><p><strong>Exponential</strong> - Retries the failed operation after increasingly longer intervals to avoid overwhelming the service. The delay grows with each retry attempt. </p><ul><li><strong>Initial delay</strong><mark style="color:red;"><strong>*</strong></mark> - The starting delay before the first retry attempt to ensure there’s at least some delay before retrying to avoid immediate re-hits. For example, an initial delay of <code>2s</code> equals a retry pattern of <code>2s</code>, <code>4s</code>, <code>8s</code>, <code>16s</code>, etc.</li><li><strong>Maximum delay</strong><mark style="color:red;"><strong>*</strong></mark> - The maximum wait time allowed between retries to prevent the retry delay from growing indefinitely. For example, an initial delay of <code>2s</code> and a maximum delay of <code>10s</code> equals a delay progression of <code>2s</code>, <code>4s</code>, <code>8s</code>, <code>10s</code>, <code>10s</code>, etc.</li><li><strong>Increasing factor</strong><mark style="color:red;"><strong>*</strong></mark> - The multiplier used to calculate the next delay interval, determining how quickly the delay grows after each failed attempt.</li></ul></li></ul></td><td></td></tr><tr><td><strong>Retry after response header</strong></td><td><p>Used to define how long to wait before making another request e.g. <code>HTTP 429 Too Many Requests</code> or <code>HTTP 503 Service Unavailable</code>. </p><ul><li><strong>Header</strong> - Follow the header syntax for the header.</li><li><p><strong>Format</strong> - The format for the header syntax (Seconds, Epoch, Epoch Timestamp, RFC1123, RFC1123Z, RFC3339). </p><ul><li>e.g. wait 120 seconds <code>Retry-After: 120</code></li><li>e.g. epoch timestamp <code>Retry-After: Wed, 21 Oct 2025 07:28:00 GMT</code></li></ul></li></ul></td><td></td></tr></tbody></table>

<figure><picture><source srcset="/files/umlISBttbp5kO2CN7mde" media="(prefers-color-scheme: dark)"><img src="/files/H3RmmIP6VKfL9CWLs2uL" alt=""></picture><figcaption></figcaption></figure>

### Throttling

Use throttling to intentionally limit the rate at which the HTTP requests are sent to the API or service.

**Throttling Type**<mark style="color:red;">**\***</mark>

{% tabs %}
{% tab title="Client" %}
The client itself controls and limits the rate at which it sends requests.&#x20;

<table><thead><tr><th width="160.23046875">Parameter</th><th>Description</th></tr></thead><tbody><tr><td><strong>Client type</strong><mark style="color:red;"><strong>*</strong></mark></td><td><p>How to manage the rate of requests. </p><ul><li><p><strong>Rate -</strong>  the client is restricted by the data transfer rate or request rate over time. </p><ul><li><strong>Maximum requests</strong><mark style="color:red;"><strong>*</strong></mark> - The maximum number of requests (or amount of data) to make within a specified time interval.</li><li><strong>Call interval</strong><mark style="color:red;"><strong>*</strong></mark> - The sliding or fixed window of time used to calculate the rate.</li><li><strong>Number of burst requests</strong><mark style="color:red;"><strong>*</strong></mark> - the number of requests that can exceed the normal rate temporarily before throttling kicks in to allow short bursts of traffic over the limit to accommodate sudden spikes without immediate blocking. e.g. if the max rate is 10 requests/sec, and burst is 5, the client could make up to 15 requests instantly, but then throttling will slow down after the burst.</li></ul></li><li><p><strong>Fixed delay</strong> - The server enforces a fixed wait time after each request before allowing the client to make the next request. Instead of limiting by rate (requests per second) or volume, it just inserts a pause/delay between requests.</p><ul><li><strong>Call interval</strong><mark style="color:red;"><strong>*</strong></mark> - The sliding or fixed window of time used to calculate the delay.</li></ul></li></ul></td></tr></tbody></table>

<figure><picture><source srcset="/files/4gKo5JzoYkBoQVxYwGN8" media="(prefers-color-scheme: dark)"><img src="/files/Ol541NKsnTB51rTDw5r3" alt=""></picture><figcaption></figcaption></figure>
{% endtab %}

{% tab title="Server" %}
The server controls the rate at which it sends data.

<table><thead><tr><th width="160.63671875">Parameter</th><th>Description</th></tr></thead><tbody><tr><td><strong>Wait response header</strong><mark style="color:red;"><strong>*</strong></mark></td><td><p>These headers inform how long to wait before retrying and how many requests remaining.</p><ul><li><strong>Header Type</strong><mark style="color:red;"><strong>*</strong></mark><strong> -</strong> Enter the header to instruct that to do e.g. <code>wait</code>., <code>Retry-After</code>, etc.</li><li><strong>Format</strong> - The format for the header syntax (Seconds, Epoch, Epoch Timestamp, RFC1123, RFC1123Z, RFC3339). e.g. wait 120 seconds <code>Retry-After: 120</code> e.g. epoch timestamp <code>Retry-After: Wed, 21 Oct 2025 07:28:00 GMT</code></li></ul></td></tr><tr><td><strong>Reset response header</strong></td><td><p>Indicates when a rate limit or throttle window resets, allowing the client to resume normal activity (e.g., making more requests or pulling more data). </p><ul><li><strong>Header Type</strong><mark style="color:red;"><strong>*</strong></mark><strong> -</strong> Enter the header to instruct that to do e.g. <code>wait</code>., <code>Retry-After</code>, etc.</li><li><strong>Format</strong> - The format for the header syntax (Seconds, Epoch, Epoch Timestamp, RFC1123, RFC1123Z, RFC3339). </li></ul></td></tr><tr><td><strong>Remaining response header</strong><mark style="color:red;"><strong>*</strong></mark></td><td>How many requests or units of usage the puller can still make within the current time window before hitting the limit and being throttled.</td></tr></tbody></table>

<figure><picture><source srcset="/files/2ujcGZjqKgT7Hho6xbHy" media="(prefers-color-scheme: dark)"><img src="/files/51IYbK93JWdzLQQpfjan" alt=""></picture><figcaption></figcaption></figure>
{% endtab %}
{% endtabs %}

<table><thead><tr><th width="211.69140625">Parameter</th><th>Description</th><th data-hidden></th></tr></thead><tbody><tr><td><strong>Pagination Type</strong><mark style="color:red;"><strong>*</strong></mark></td><td>Choose how the API organizes and delivers large sets of data across multiple pages—and how that affects the process of systematically collecting or extracting all available records.</td><td></td></tr></tbody></table>

#### Output

<table><thead><tr><th width="179.74609375">Parameter</th><th>Description</th></tr></thead><tbody><tr><td><strong>Select</strong><mark style="color:red;"><strong>*</strong></mark></td><td>If your connection does not require authentication, leave as <strong>None.</strong> Otherwise, choose the authentication type and enter the details. A JSON selector expression to pick a part of the response e.g. '.data'.</td></tr><tr><td><strong>Filter</strong></td><td>A JSON expression to filter the selected elements. Example: <code>'.films | index("Tangled")'</code>.</td></tr><tr><td><strong>Map</strong></td><td>A JSON expression to transform each selected element into a new event.<br>Example: <code>'{characterName: .name}'</code>.</td></tr><tr><td><strong>Output Mode</strong><mark style="color:red;"><strong>*</strong></mark></td><td><p>Choose between </p><ul><li><strong>Element</strong>: emits each transformed element individually as an event.</li><li><strong>Collection</strong>: emits all transformed items as a single array/collection as an event.</li></ul></td></tr></tbody></table>

<details>

<summary>Collection example</summary>

Let´s say you have the following SIEM Integration events from Sophos.

<pre class="language-yaml"><code class="lang-yaml">collectionPhase:
  paginationType: cursor
  cursorSelector: ".next_cursor"
<strong>  initialRequest:
</strong>    method: GET
    url: "${inputs.dataRegionURL}/siem/v1/events"
    headers:
      - name: Accept
        value: application/json
      - name: Accept-Encoding
        value: gzip, deflate
      - name: X-Tenant-ID
        value: "${inputs.tenantId}"
    queryParams:
      - name: from_date
        value: "${temporalWindow.from}"
    bodyParams: []
  nextRequest:
    method: GET
    url: "${inputs.dataRegionURL}/siem/v1/events"
    headers:
      - name: Accept
        value: application/json
    queryParams:
      - name: cursor
        value: "${pagination.cursor}"
    bodyParams: []
  output:
    select: ".result"
    filter: "."
    map: "."
    outputMode: element
</code></pre>

* Pagination type - `cursor`. If you select the cursor type, you retrieve the data in chunks (pages) using a cursor token, which points to the position in the dataset where the next page of results should start.
  * Cursor selector - The cursor selector tells the HTTP Puller where to find the cursor value in the API response so it can be saved and used in the next request e.g. `.next_cursor`

* Initial request - We fetch the first set of results, the response including the cursor token (e.g. timestamp or ID).
  * method - `GET` to fetch the results.
  * url - The URL is composed of various elements:
    * `https://${inputs.dataRegionURL}`- these variables are taken from the values you entered in the **Parameters** section of the HTTP Pull settings.
    * `/siem/v1/` -API base path — indicates you're calling version 1 of the SIEM API.
    * `events`- indicates the specific endpoint being accessed. `events` general category of the API (event-related).&#x20;

* headers - these headers are key-value pairs that provide additional information to the server when making a request.
  * **name** - `Accept`
  * **value** - `application/json` tells the server that the client expects the response to be in JSON format, a standard HTTP header used for content negotiation.

* Next request - send the cursor token back to the server using a parameter (e.g., ?cursor=abc123) to get the next page of results. The server returns the next chunk of data and a new cursor.

  Repeat until no more data or the server returns a `has_more: false flag.`method

* **Output**
  * select - `.result` Selects the part of the response to extract. This is a JSONPath-like expression that tells the puller where to find the list or array of items in the response.
  * map - `.` Maps each selected item as-is, keeping each object unchanged. It passes through each item without transforming it. If you needed to restructure or extract specific fields from each item, you would replace `.` with a field mapping (e.g., `.id`, `{ "id": .id, "name": .username }`, etc.).
  * output mode - `element` Controls the output format. Each item from the `select` result will be emitted individually using `element`. This is useful for event stream processing, where each object (e.g., an alert or event) is treated as a separate record. Other possible values (depending on the platform) might include `array` (emit as a batch) or `raw` (emit as-is).

</details>

## Ports

The HTTP Pull Listener has two output ports:

* **Default port** - Events are sent through this port if no error occurs while processing them.
* **Error port** - Events are sent through this port if an error occurs while processing them.

{% hint style="warning" %}
The error message is provided in a free-text format and may change over time. Please consider this if performing any post-processing based on the message content.
{% endhint %}

## Examples

### 1. Basic GET Puller

Here's a simple example of using the HTTP Puller collector with parameters for a basic GET request. No authentication, no pagination, just pulling JSON data from an API endpoint. Keep **Config as YAML, Temporal window, Authentication** and **Enumeration phase** as `OFF.`

* **Collection phase**&#x20;
  * Pagination type - `none` Indicates that you only need one request to retrieve all data at once.
    * Repeat until: ***No repeat*** to ignore, or ***No data*** to repeat the request until no data is returned.&#x20;
  * **Request**
    * **Response type -** `json`Tells the puller to expect a JSON response.
    * **Method:** `GET` Performs a basic HTTP GET request.
    * **URL**: Constructed from the `parameters.domain` and `parameters.path https://{{parameters.domain}}{{parameters.path}}`
  * **Headers**: Set standard headers and include the API key.
  * **Output:**
    * **Select:`.logs`** Tells the system where to find the list of log entries in the response.
    * **Output mode:** `element` each object inside `.logs` will be extracted as a separate output element e.g.&#x20;

      ```json
      {
        "logs": [
          { "timestamp": "2024-12-01T12:00:00Z", "event": "user_login" },
          { "timestamp": "2024-12-01T12:05:00Z", "event": "file_upload" }
        ]
      }
      ```

<figure><picture><source srcset="/files/p5d3pCkD9OXDNiE6WZOi" media="(prefers-color-scheme: dark)"><img src="/files/2gj2TXEMxlWj24kwLMI3" alt=""></picture><figcaption></figcaption></figure>

### 2. Make an HTTP request using offset and limit pagination

Instead of displaying the results in a scrollable list, we will use offset/limit pagination to fetch data in pages.

* **Pagination type** - `offset/Limit` We control how many records are returned at a time (`limit`) and choose where to start each request (`offset` or `skip` parameter)
* **Zero Index** - `false`
* **Limit**<mark style="color:red;">**\***</mark> - `50`
* **Request -** The request to be repeated, with `offset` and `limit` automatically incremented per iteration.
* **Response type**<mark style="color:red;">**\***</mark> - `Json`
* **Method**<mark style="color:red;">**\***</mark> - `GET`
* **URL**<mark style="color:red;">**\***</mark> - `https://example.com/items`
* **Query params** The API supports pagination through query parameters:
  * Name - `skip`&#x20;
  * Value - `${pagination.offset}"` the number of records to skip before returning results
  * Name - `limit`&#x20;
  * Value - `${pagination.limit}` uses the **limit** entered (`50`) as the maximum number of records to return in one request.

```yaml
collectionPhase
  paginationType:
  "offsetLimit"
    limit: 50
    isZeroIndex: false
    request:
      method: "GET"
      url: "https://example.com/items"
      queryParams:
        - name: skip
          value: "${pagination.offset}"
          name: limit
          value: "${pagination.limit}"
```

<figure><picture><source srcset="/files/NztuKmkU6hAqGs9KBobO" media="(prefers-color-scheme: dark)"><img src="/files/ykmZkptNPmK3lBLfJlhr" alt=""></picture><figcaption></figcaption></figure>

### 3. Enumeration + Collection with `responseBodyLink`

This example defines a data extraction workflow that

1. Enumerates through a paginated API endpoint using `responseBodyLink`.
2. Filters and transforms specific data from the paginated results.
3. Collects further data based on the enumerated output using individual requests.

It also uses a **temporal window** to scope or schedule the data extraction process.

```yaml
# Temporal window (optional)
# Generated variables: $temporalWindow.from, $temporalWindow.to
temporalWindow:
  duration: 5m
  offset: 10m
  tz: UTC
  format: RFC3339
enumerationPhase:
  paginationType: 
    responseBodyLink:
      nextLinkSelector: ".info.nextPage"
      request:
        method: "GET"
        url: "https://api.cyberintel.dev/iocs"
        headers:
        - name: accept
          value: "application/json"
        bodyExpression:
          expression: "(.data | length) == 50"
  output:
    select: '.data'
    filter: '.threatType == "Ransomware"'
    map: '._id'
    outputMode: "element"
collectionPhase:
  variables:
    - name: id
      source: input
      expression: "."
    paginationType: none
     repeatUntilNoData: true 
      request:
        method: "GET"
        url: "https://api.cyberintel.dev/iocs/${id}"
        headers:
       -  name: accept
          value: "application/json"
  output:
    select: ".data"
    filter: ""
    map: "{iocName: .name}"
    outputMode: "element"
```

#### **Enumeration**

The **enumeration** defines how to gather data in a paginated manner from the Cyber Threat Intelligence API using the `responseBodyLink` pagination strategy.

* **Pagination Type -** The type is `Next Link At Response Body`
* **Selector -** The next page link is found using the JSON path `".info.nextPage"` This suggests that the response will contain a field `info.nextPage` with the URL of the next page of results.

For example, the response might look like:

```json
{
  "info": {
    "nextPage": "https://api.cyberintel.dev/iocs?page=2"
  },
  "data": [ ... ]
}
```

* **Response type -** `JSON`
* **Method -** `GET`.  The HTTP method is **GET** to fetch the data.
* **URL** - The initial URL for the request is `"https://api.cyberintel.dev/iocs"`, where the IOCs are listed.
* **headers -** The `Accept` header specifies that the response should be in **JSON** format.

**Output**

* **Select -** The `.data` array from the response is selected for further processing. This array contains the actual IOC data.
* **Filter -** The `filter` expression `'.threatType == "Ransomware"'` selects only those IOCs where the `threatType` is `"Ransomware"`. This is how we focus on ransomware-related indicators.
* **Map -** The `map` expression `'._id'` extracts the `._id` field from each IOC that passed the filter. This results in a list of **IOC IDs** that match the ransomware threat type.
* **Output Mode -** `element` indicates that each IOC ID (element) is treated as an individual item, rather than as a group or array.

**Result:** After processing the pages, we will have a list of **ransomware IOC IDs**.

**Collection**

Once the enumeration process gathers a list of **IOC IDs** related to ransomware, the **collection** section is responsible for retrieving more detailed information for each of those IOCs.

**variables -** This section defines variables used in the collection step.

* **Name** - `id`: The variable `id` represents each individual IOC ID from the enumeration output.
* **Source -** The `source: input` means that the IDs come from the output of the previous enumeration step.
* **Expression -** `expression: "."` simply takes each item from the input (the IOC IDs).

**HTTP Request for Detailed IOC Information**

* **Pagination type:** The type is `"none"`, indicating no additional processing is needed before making the request.
* **Response type** - `JSON`.
* **Method:** The HTTP method is **GET**, to fetch detailed information about each IOC.
* **Url:** The URL for each IOC is dynamic, with the IOC ID substituted in the URL (`${id}`). For example, if `id = "a1b2"`, the URL would be `https://api.cyberintel.dev/iocs/a1b2`.
* **Headers:** The `Accept: "application/json"` header ensures the response is in JSON format.

**Output Selection and Mapping**

* **Select:** This selects the `.data` field from the response, which contains the detailed information for the IOC.
* **Filter:** No additional filtering is applied.
* **Map:** The map expression `"{iocName: .name}"` creates a new object with the `iocName` key, mapping it to the `.name` of the IOC from the response.
* **Output Mode:** `outputMode: "element"` means each IOC’s name will be treated as an individual output item.

**Result:** Each IOC name (or other information, if mapped) will be saved to a file.

<figure><picture><source srcset="/files/3ctTiMNf8ZT4CHjt8pTo" media="(prefers-color-scheme: dark)"><img src="/files/pn10GJujXJ3nutGPimtU" alt=""></picture><figcaption></figcaption></figure>

### 4. Enumeration (collection output) + Collection (POST with `bodyRaw`)

**Temporal window** defines a 5-minute slice of time, offset 10 minutes ago.

**Enumeration** step:

* Makes a paginated GET to `/posts`.
* Extracts IDs from posts within the time window.
* Produces a **collection of IDs**.

**Collection** step:

* Uses those IDs in a POST request.
* Filters, maps, and outputs enriched objects (`id, title, status`).
* Saves results to a file.

<pre class="language-yaml"><code class="lang-yaml"># Temporal window (optional)
temporalWindow:
  duration: 5m
<strong>  offset: 10m
</strong>  tz: UTC
  format: RFC3339
enumerationPhase:
  httpRequest:
    type: "page"
    page:
      pageSize: 50
      request:
        method: "GET"
        url: "https://api.fake-rest.refine.dev/posts"
        headers:
          Accept: "application/json"
        queryParams:
          from: "${temporalWindow.from}"
          to: "${temporalWindow.to}"
          _page: "${pagination.pageNumber}"
          _per_page: "${pagination.pageSize}"
  output:
    select: '.'
    # filter: '.language == 3'
    map: '{id: .id}'
    outputMode: "collection"
collectionPhase:
  variables:
    - name: ids
      source: input
      expression: "."
      format: "json"
  httpRequest:
    type: "none"
    none:
      request:
        method: "POST"
        url: "https://api.fake-rest.refine.dev/posts"
        headers:
          Accept: "application/json"
        bodyType: "raw"
        bodyRaw: |
          {
            "ids": ${inputs.ids}
          }
  output:
    select: "."
    filter: ".id > 10"
    map: "{id: .id, title: .title, status: .status}"
    outputMode: "element"
</code></pre>

* **Duration -** `5m` window size is 5 minutes.
* **Offset -** `10m`  shifts the window back 10 minutes from “now”. So if current UTC is `12:00`, the range would be `11:45 – 11:50`.
* **Time zone -** `UTC`&#x20;
* **Format -** `RFC3339` output format for timestamps (e.g., `2025-08-20T12:00:00Z`).

The variables `${temporalWindow.from}` and `${temporalWindow.to}` get auto-populated with these calculated times.

**Enumeration**

* **Pagination type** - `page number/page size`
* **Page size:** `50`  fetch 50 records per request.
* **Request**&#x20;
  * **Response type** - `JSON`
  * **Method** - `GET` &#x20;
  * **URL** - `https://api.fake-rest.refine.dev/posts`
  * **Query Params**&#x20;

    **1.From: "${temporalWindow\.from}"**

    * Inserts the **start timestamp** of the time window. `${temporalWindow.from}` is automatically computed based on your `temporalWindow` configuration e.g.\
      If `now = 12:00 UTC`, `offset = 10m`, and `duration = 5m` = `temporalWindow.from = 11:45 UTC` (start) In the request, this becomes something like:

    ```
    ?from=2025-08-20T11:45:00Z
    ```

    **2. to: "${temporalWindow\.to}"** Inserts the **end timestamp** of the time window e.g.

    `temporalWindow.to = 11:50 UTC` (end). In the request, this becomes:

    ```
    &to=2025-08-20T11:50:00Z
    ```

    So together, `from` and `to` tell the API:

    > “Only give me records between 11:45 and 11:50 UTC.”

    **3. \_page: "${pagination.pageNumber}"** This is a built-in pagination variable.

    `${pagination.pageNumber}` auto-increments as the system makes repeated requests to fetch all pages e.g. First request `_page=1` Second request  `_page=2` etc.

    This ensures you don’t just get the first batch, but all results page by page.

    **4. \_per\_page: "${pagination.pageSize}”** Controls how many records to fetch per page.

    This pulls from your earlier configuration

    ```yaml
    page:
      pageSize: 50
    ```

    So each request includes: `&_per_page=50`

    ```
    &_per_page=50
    ```
* **Select - `'.'`**&#x73;elects the entire JSON response.
* **Filter -** would filter only records where `.language == 3`.
* **Map** - extracts only `{id: .id}` for each record.
* **Output Mode -** `collection` outputs an **array of items** (instead of single elements).

```json
[
  {"id": 1},
  {"id": 2},
  {"id": 3}
]
```

<figure><picture><source srcset="/files/AYyWn8kqqRFXCrPh5YyE" media="(prefers-color-scheme: dark)"><img src="/files/gRmlh3ykH0izyCfdrFiR" alt=""></picture><figcaption></figcaption></figure>

**Collection (POST with BodyRaw)**

* **Pagination Type** - `Next link at response body`
* **Selector** - `"."`  take the full collection.
* **Response Type** - `json` keep it as JSON (array of IDs).
* **Method -** `POST` to send data.
* **URL** - `https://api.fake-rest.refine.dev/posts`
* **Body Type:** `raw`  freeform JSON payload.
* **Body Content -** sends the IDs collected in the enumeration: `ids": ${inputs.ids}`
* **Select:** `"."` take the full response.
* **Filter -** `".id > 10"` only keep posts with ID greater than 10.
* **Map** - reduce each record to `{id, title, status}`.
* **Output Mode -** `element`  output individual objects, one at a time.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.onum.com/the-workspace/listeners/listener-integrations/pull-data-from-http-endpoints.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
