Pull data from HTTP endpoints

Most recent version: v0.0.6

See the changelog of this Listener type here.

Note that this Listener is only available in certain Tenants. Get in touch with us if you don't see it and want to access it.

Overview

Onum supports integration with HTTP Pull. Select HTTP Pull from the list of Listener types and click Configuration to start.

Prerequisites

In order to use this Listener, you must activate the following environment variable in your distributor using docker compose:HTTP_PULL_LISTENER_ENABLED

HTTP Pull configuration

Double-click the HTTP Pull Listener.

Enter a Name for the new Listener. Optionally, add a Description and some Tags to identify the Listener.

Now you need to specify the Parameters.

Enter the name of the parameter to search for in the YAML below, used later as ${parameters.name} e.g. ${parameters.domain}
Enter the value or variable to fill in when the given parameter name has been found, e.g. domain.com.
With the name set as domain and the value set as mydomain , the expression to execute on the YAML would be: ${parameters.domain}, which will be automatically replaced by the variable. Add as many name/value pairs as required.

YAML Sample:

  url: "https://${parameters.domain}/api/v2/events/dataexport/alerts/
  headers:
    — name: Accept
      value: application/json
    — name: Netskope—Api—Token
      value: "${secrets.netskopeApiToken}"
nextRequest:
  method: GET
  url: "https://${parameters.domain}/api/v2/events/dataexport/alerts/
  headers:
    — name: Accept
      value: application/json
    — name: Netskope—Api—Token
      value: "${secrets.netskopeApiToken}"

Next, configure your Secrets

Enter the name of the parameter to search for in the YAML below, used later as ${secrets.name}
Select the Secret containing the connection credentials if you have added them previously, or select New Secret to add it. This will add this value as a variable when the field name is found in the YAML. Add as many as required.

YAML Sample:

  method: GET
  url: "https://${parameters.domain}/api/v2/events/data
  headers:
    — name: Accept
      value: application/json
    — name: Netskope—Api—Token
      value: "${secrets.netskopeApiToken}"
nextRequest:
  method: GET
  url: "https://${parameters.domain}/api/v2/events/data
  headers:
    — name: Accept
      value: application/json
    — name: Netskope—Api—Token
      value: "${secrets.netskopeApiToken}"

Toggle on to configure the HTTP as a YAML and paste it here.

The system supports interpolated variables throughout the HTTP request building process using the syntax: ${prefix.name}

Each building block may:

Use variables depending on their role (e.g., parameters, secrets, pagination state).
Expose variables for later phases (e.g., pagination counters, temporal window bounds).

Not all variable types are available in every phase. Each block has access to a specific subset of variables.

Variables can be defined in the configuration or generated dynamically during execution. Each variable has a prefix that determines its source and scope.

These are the supported prefixes:

Parameters - User-defined values configured manually. Available in all phases.
Secrets - Sensitive values such as credentials or tokens. Available in all phases.
temporalWindow - Automatically generated from the Temporal Window block. Available in the Enumeration and Collection phases.
Pagination - Values produced by the pagination mechanism (e.g., offset, cursor). Available in the Enumeration and Collection phases.
Inputs - Values derived from the output of the Enumeration phase. Available only in the Collection phase.

If you do not have a YAML to paste, see how to manually configure the various components of a YAML in the following sections.

Desconstructing a YAML

Here we will learn what each parameter of the YAML means, and how they correspond to the settings in the HTTP Pull Listener.

The YAML is used for pulling alerts via an API and typically uses

A Temporal Window to enable the use of a time-based query window for filtering results.
Authentication using a token to authenticate the connection.
The first phase (Enumeration) enables an initial listing phase to get identifiers (e.g., alert IDs), paginating through the results.
The second phase (Collection) then fetches full alert details using the alert IDs from the enumeration phase.
Standard JSON response mapping is used to output the results.

Only the Collection phase is mandatory, the rest of the fields are optional.

Let´s take a closer look at each phase below.

Temporal window

A temporal window is a defined time range used to filter or limit data retrieval in queries or API requests. It specifies the start and end time for the data you want to collect or analyze. This YAML uses a temporal window of 5 minutes, in RFC3339 format, with an offset of 10, in UTC timezone.

Parameter

Description

Duration*

Add the duration that the window will remain open for.

Offset*

How far back from the current time the window starts.

Time Zone*

This value is usually automatically set to your current time zone. If not, select it here.

Format*

Choose between Epoch or RCF3339 for the timestamp format.

Temporal Window example

withTemporalWindow: true
temporalWindow:
  duration: 5m
  offset: 10
  tz: UTC
  format: RFC3339

In Onum, toggle ON the Temporal Window selector and enter the information in the corresponding fields

Duration* - 5m
Offset* - 10
TZ* - this will set automatically according to your current timezone.
Format* - RFC3339

So if the current UTC is 12:00, the range would be 11:50 - 11:55.

Authentication phase

If your connection requires authentication, enter the credentials here.

Parameter

Description

Authentication Type*

Choose the authentication type and enter the details.

Authentication credentials

The options provided will vary depending on the type chosen to authenticate your API. This is the type you have selected in the API end, so it can recognize the request.

Choose between the options below.

Basic

Username* - the user sending the request.
Password* - the password eg: ${secrets.password}

withAuthentication: true
authentication:
  type: basic
  basic:
    username: testuser
    password: testpass

API Key

Enter the following:

API Key - API keys are usually stored in developer portals, cloud dashboards, or authentication settings. Set the a secret, eg: ${secrets.api_key}
Auth injection:
- In* - Enter the incoming format of the API: Header or Query.
- Name* - The header name or parameter name where the api key will be sent.
- Prefix - Enter a prefix if required.
- Suffix - Enter a suffix if required.

withAuthentication: true
authentication:
 type: apiKey
  apiKey:
    apiKey: test-api-key
    authInjection:
      name: X-API-Key
      in: header
      prefix: "Bearer"

Token

Token Retrieve Based Authentication

Request -
- Method* - Choose between GET or POST
- URL*- Enter the URL to send the request to.
Headers - Add as many headers as required.
- Name
- Value
Query Params - Add as many query parameters as required.
- Name
- Value
Token Path* - Enter your Token Path for used to retrieve an authentication token.
Auth injection:
- In* - Enter the incoming format of the API: Header or Query.
- Name* - A label assigned to the API key for identification. You can find it depending on where the API key was created.
- Prefix - Enter a connection prefix if required.
- Suffix - Enter a connection suffix if required.

withAuthentication: true
authentication:
  type: token
  token:
    request:
      method: POST
      url: ${parameters.domain}/oauth2/token
      headers:
        - name: Content-Type
          value: application/x-www-form-urlencoded
      bodyType: urlEncoded
      bodyParams:
        - name: grant_type
          value: client_credentials
        - name: client_id
          value: '${secrets.client_id}'
        - name: client_secret
          value: '${secrets.client_secret}'
    tokenPath: ".access_token"
    authInjection:
      in: header
      name: Authorization
      prefix: 'Bearer '
      suffix: ''

Example

Type - Token. Token authentication is a method of authenticating API requests by using a secure token, usually passed in an HTTP header.
Request
- method - POSTSends a POST request to obtain an access token.
- url - ${parameters.domain}/oauth2/tokenThe OAuth token endpoint. ${parameters.domain} is a placeholder for value entered in the Parameters section.
- headers - these headers are key-value pairs that provide additional information to the server when making a request.
  - name - Content-Type
  - value - application/x-www-form-urlencodedIndicates that the request body is formatted as URL-encoded key-value pairs (standard for OAuth token requests).
- Body type -urlEncoded Specifies the request body format is URL-encoded (like key=value&key2=value2).
  - Body params
    name - grant_type Required by OAuth 2.0 to specify the type of grant being requested.
    value - client_credentials Used for server-to-server authentication without a user.
    name - client_ID
    value - ${secrets.client_id}this is a dynamic variable pulled from the value entered in the Secrets setting.
    name - client_secret
    value - ${secrets.client_secret} this is a dynamic variable pulled from the value entered in the Secrets setting.
- Token path - Extracts the access token from the JSON response of an authentication request. It's a JSONPath-like expression used to locate the token in the response body.

Toggle ON the Authentication option.

Auth injection - This part defines how and where to inject the authentication token (typically an access token) into the requests after it has been retrieved, for example, from an OAuth token endpoint.
- in -headerThe token should be injected into the HTTP header of the request.This is the most common method for passing authentication tokens.
- Name -AuthorizationThe name of the header that will contain the token. Most APIs expect this to be Authorization.
- prefix - The text added before the token value.Bearer is the standard prefix for OAuth 2.0 tokens.
- suffix -''Text added after the token value. In this case, it's empty — nothing is appended.

HMAC

Signs the queries using a secret key that is used by the server to authenticate and validate integrity.

Token Retrieve Based Authentication

Request

Generate ID - Toggle ON to generate.
Generate Timestamp
- Timezone* - this field is automatically-filled using your current timezone.
- Format* - the format for the timestamp syntax (Seconds, Epoch, Epoch Timestamp, RFC1123, RFC1123Z, RFC3339 or custom). Selecting custom opens the Go time format option, where you can write your custom syntax e.g. 2 Jan 2006 15:04:05
Generate content hash
- Content hash
  - Hashing algorithm* - select the hash operation to carry out on the content.
  - Encoding* - choose the encoding method.
- Hashing
  - Hashing algorithm* - select the hash operation to carry out on the content.
  - Encoding* - choose the encoding method.
  - Secret key* - how to generate the string that will be signed.
  - Data to sign* - e.g. "${request.method}\n${request.contentHash}\napplication/json\n${request.relativeUrl}\n${request.timestamp}"
Headers to be added to the request (name & value).

withAuthentication: true
authentication:
  type: hmac
  hmac:
    request:
      generateTimestamp: true
      timestamp:
        tz: UTC
        format: EpochMillis
    hash:
      secretKey: ${secrets.apiSecret}
      algorithm: hmac_sha256
      encoding: hex
      dataToSign: "${secrets.apiKey}${request.body}${request.timestamp}"
    headers:
      x-logtrust-apikey: ${secrets.apiKey}
      x-logtrust-timestamp: ${request.timestamp}
      x-logtrust-sign: ${hmac.hash}

Example: Authenticate HTTP requests to Microsoft Azure using the HMAC-SHA256 scheme.

Learn how to calculate the HMAC for this API here.

withAuthentication: true
authentication:
  type: hmac
  hmac:
    request:
      generateTimestamp: true
      timestamp:
        tz: UTC
        format: RFC1123
      generateContentHash: true
      contentHash:
        algorithm: sha256
        encoding: base64
    hash:
      algorithm: hmac_sha256
      encoding: base64
      secretKey: ${secrets.secretKey}
      dataToSign: "${request.method}\n${request.relativeUrl}\n${request.timestamp};${request.host};${request.contentHash}"
    headers:
      - name: x-ms-date
        value: ${request.timestamp}
      - name: x-ms-content-sha256
        value: ${request.contentHash}
      - name: Authorization
        value: "HMAC-SHA256 Credential=${secrets.accessKeyId}&SignedHeaders=x-ms-date;host;x-ms-content-sha256&Signature=${hmac.hash}"

Type - HMAC.

Request Parameters

Generate Timestamp
- Timezone - UTC
- Format - RFC1123
Generate Content Hash
- Algorithm - sha256
- Encoding - base64

Hash

Base64-encoded HMACSHA256 of the String-To-Sign.

Algorithm - hmac_sha256
Encoding - base64
Secret Key - ${secrets.secretKey} This variable is retrieved from the secrets parameter.
Data To Sign - A canonical representation of the request with the format HTTP_METHOD + '\n' + path_and_query + '\n' + signed_headers_values ${request.method}\n${request.relativeUrl}\n${request.timestamp};${request.host};${request.contentHash}

Headers

Name - x-ms-date can be used when the agent cannot directly access the Date request header or when a proxy modifies it. If both x-ms-date and Date are provided, x-ms-date takes precedence.
Value - ${request.timestamp}
Name - x-ms-content-sha256 Base64-encoded SHA256 hash of the request body. It must be provided even if there is no body.
Value - ${request.contentHash}
Name - Authorization Required by the HMAC-SHA256 scheme.
Value - HMAC-SHA256 Credential=${secrets.accessKeyId}&SignedHeaders=x-ms-date;host;x-ms-content-sha256&Signature=${hmac.hash}

Example 2: API HMAC Authentication for Oracle

See here for how to calculate the API HMAC in Oracle.

Type - HMAC.

Request Parameters

Generate ID
- Type - uuid
Generate Timestamp
- Timezone - UTC
- Format - Epoch
Generate Content Hash
- Algorithm - sha1
- Encoding - base64 - The binary hash result will be encoded in Base64 for transmission.

Hash

Base64-encoded HMACSHA256 of the String-To-Sign.

Algorithm - hmac_sha256
Encoding - base64
Secret Key - ${secrets.secretKey} This variable is retrieved from the secrets parameter.
Data To Sign - ${request.method}\n${request.contentHash}\napplication/json${request.timestamp}\n${request.relativeUrl}This is the canonical string-to-sign:
- ${request.method} - HTTP method (e.g., GET, POST)
- ${request.contentHash} - Base64 SHA-1 hash of the request body
- "application/json" - Hardcoded content type
- ${request.timestamp} - Epoch UTC timestamp
- ${request.relativeUrl} - The relative path and query string
  The \n means each element is separated by a newline.

Headers

Name - ct-authorization
Value - CTApiV2Auth ${parameters.publicKey}:${hmac.hash}
- CTApiV2Auth - Authentication scheme name.
- ${parameters.publicKey} - Public key or access ID.
- ${hmac.hash} - The generated HMAC-SHA256 signature from the hash section.
Name - ct-timestamp
Value - ${request.timestamp} the same Epoch UTC timestamp generated earlier.

withAuthentication: true
authentication:
  type: hmac
  hmac:
    request:
      generateId: true
      idType: uuid
      generateTimestamp: true
      timestamp:
        tz: UTC
        format: Epoch
      generateContentHash: true
      contentHash:
        algorithm: sha1
        encoding: base64
    hash:
      algorithm: hmac_sha256
      encoding: base64
      secretKey: ${secrets.secretKey}
      dataToSign: "${request.method}\n${request.contentHash}\napplication/json${request.timestamp}\n${request.relativeUrl}"
    headers:
      - name: x-ct-authorization
        value: CTApiV2Auth ${parameters.publicKey}:${hmac.hash}
      - name: x-ct-timestamp
        value: ${request.timestamp}

Akamai EdgeGrid

Authenticates using Akamai EdgeGrid endpoints.

Akamai EdgeGrid Authentication

Request

Client Token*
Access Token*
Client Secret *

Log in to the Akamai Control Center
Navigate to Identity & Access Management > API Users
Either select an existing API user or create a new one
Click Create API Client or view existing credentials
In the credentials view, you'll find the Client Token along with the Client Secret and Access Token

Advanced Configuration

Maximum body size in bytes or empty for whole body - This configuration parameter defines the maximum allowed size (in bytes) for request bodies sent to the Akamai authentication endpoint. When set to a specific byte value, it limits the amount of data that will be processed, preventing potential resource exhaustion from oversized payloads. When left empty, the system will process the entire body regardless of size
Headers added to the signature - HTTP header fields that are incorporated into the cryptographic signature calculation for request verification. These headers become part of the signed content that Akamai uses to validate the authenticity and integrity of the request.

withAuthentication: true
authentication:
  type: akamai
  akamai:
    clientSecret: ${secrets.clientSecret}
    accessToken: ${secrets.accessToken}
    clientToken: ${secrets.clientToken}

Bloodhound

Authenticates using BloodHound API.

Token ID*
- A unique identifier (usually a UUID/GUID) visible in the interface even after token creation e.g. tkn_12a34567-89b0-12c3-d456-789012ef3456
Token Key*
- Log in to the BloodHound CE or Enterprise web interface
- Navigate to Settings or User Profile (typically accessible from the top-right user menu)
- Select API Tokens or Access Tokens
- Either view existing tokens or create a new one by clicking Generate Token
- Provide a name/description for the token and set appropriate permissions
- After creation, the Token Key will be displayed once (copy it immediately as it won't be shown again)

withAuthentication: true
authentication:
  type: bloodHound
  bloodHound:
    tokenId: ${secrets.tokenId}
    tokenKey: ${secrets.tokenKey}

Retry

Toggle ON to allow for retries and to configure the specifics.

Parameter

Description

Retry Type*

Fixed - Retries the failed operation after a constant, fixed interval every time e.g. the same amount of time between each retry attempt
- Interval* - enter the amount of time to wait e.g. 5s.
Exponential - Retries the failed operation after increasingly longer intervals to avoid overwhelming the service. The delay grows with each retry attempt.
- Initial delay* - The starting delay before the first retry attempt to ensure there’s at least some delay before retrying to avoid immediate re-hits. For example, an initial delay of 2s equals a retry pattern of 2s, 4s, 8s, 16s, etc.
- Maximum delay* - The maximum wait time allowed between retries to prevent the retry delay from growing indefinitely. For example, an initial delay of 2s and a maximum delay of 10s equals a delay progression of 2s, 4s, 8s, 10s, 10s, etc.
- Increasing factor* - The multiplier used to calculate the next delay interval, determining how quickly the delay grows after each failed attempt.

Retry after response header

Used to define how long to wait before making another request e.g. HTTP 429 Too Many Requests or HTTP 503 Service Unavailable.

Header - Follow the header syntax for the header.
Format - The format for the header syntax (Seconds, Epoch, Epoch Timestamp, RFC1123, RFC1123Z, RFC3339).
- e.g. wait 120 seconds Retry-After: 120
- e.g. epoch timestamp Retry-After: Wed, 21 Oct 2025 07:28:00 GMT

Retry on errors processing body

Toggle ON to allow retry on body failures. When a response cannot be parsed, it will be retried the number of times specified in the maximum number of retries field.

Throttling

Use throttling to intentionally limit the rate at which the HTTP requests are sent to the API or service.

Throttling Type*

The client itself controls and limits the rate at which it sends requests.

Parameter

Description

Client type*

How to manage the rate of requests.

Rate - the client is restricted by the data transfer rate or request rate over time.
- Maximum requests* - The maximum number of requests (or amount of data) to make within a specified time interval.
- Call interval* - The sliding or fixed window of time used to calculate the rate.
- Number of burst requests* - the number of requests that can exceed the normal rate temporarily before throttling kicks in to allow short bursts of traffic over the limit to accommodate sudden spikes without immediate blocking. e.g. if the max rate is 10 requests/sec, and burst is 5, the client could make up to 15 requests instantly, but then throttling will slow down after the burst.
Fixed delay - The server enforces a fixed wait time after each request before allowing the client to make the next request. Instead of limiting by rate (requests per second) or volume, it just inserts a pause/delay between requests.
- Call interval* - The sliding or fixed window of time used to calculate the delay.

Enumeration phase

The enumeration phase is an optional step in data collection or API integration workflows, where the system first retrieves a list of available items (IDs, resource names, keys, etc.) before fetching detailed data about each one.

Identify the available endpoints, methods, parameters, and resources exposed by the API. This performs initial data discovery to feed the collection phase and makes the results available to the Collection Phase via variable interpolation (inputs.*).

Can use:

${parameters.xxx}
${secrets.xxx}
${temporalWindow.xxx} (if configured)
${pagination.xxx} Pagination variables

Parameter

Description

Pagination Type*

Select one from the drop-down. Pagination type is the method used to split and deliver large datasets in smaller, manageable parts (pages), and how those pages can be navigated during discovery.

Each pagination method manages its own state and exposes specific variables that can be interpolated in request definitions (e.g., URL, headers, query params, body).

None

Description: No pagination; only a single request is issued.
- Repeat until: No repeat to ignore, or No data to repeat the request until no data is returned.

PageNumber/PageSize

Description: Pages are indexed using a page number and fixed size.
Configuration:
- pageSize: page size
Exposed Variables:
- ${pagination.pageNumber}
- ${pagination.pageSize}

Offset/Limit

Description: Uses offset and limit to fetch pages of data.
Configuration:
- Limit: max quantity of records per request
Exposed Variables:
- ${pagination.offset}
- ${pagination.limit}

From/To

Description: Performs pagination by increasing a window using from and to values.
Configuration: limit: max quantity of records per request
Exposed Variables:
- ${pagination.from}
- ${pagination.to}

Web Linking (RFC 5988)

Description: Parses the Link header to find the rel="next" URL.
Exposed Variables: None

Next Link at Response Header

Description: Follows a link found in a response header.
Configuration:
- headerName: header name that contains the next link
Exposed Variables: None

Next Link at Response Body

Description: Follows a link found in the response body.
Configuration:
- nextLinkSelector: path to next link sent in response payload
Exposed Variables: None

Cursor

Description: Extracts a cursor value from each response to request the next page.
Configuration:
- cursorSelector: path to the cursor sent in response payload
Exposed Variables:
- ${pagination.cursor}

Output

Parameter

Description

Select*

If your connection does not require authentication, leave as None. Otherwise, choose the authentication type and enter the details. A JSON selector expression to pick a part of the response e.g. '.data'.

Filter

A JSON expression to filter the selected elements. Example: '.films | index("Tangled")'.

Map

A JSON expression to transform each selected element into a new event. Example: '{characterName: .name}'.

Output Mode*

Choose between

Element: emits each transformed element individually as an event.
Collection: emits all transformed items as a single array/collection as an event.

Enumeration example

enumerationPhase:
  paginationType: offsetLimit
  limit: 100
  request:
    responseType: json
    method: GET
    url: ${parameters.domain}/alerts/queries/alerts/v2
    queryParams:
      - name: offset
        value: ${pagination.offset}
      - name: limit
        value: ${pagination.limit}
      - name: filter
        value: created_timestamp:>'${temporalWindow.from}'+created_timestamp:<'${temporalWindow.to}'
  output:
    select: ".resources"
    map: "."
    outputMode: collection

Pagination type - offset/LimitUses classic pagination with offset and limit to page through results, fetching data in batches (pages) — limit determines page size, offset determines where to start.
Limit - Retrieves up to 100 records per request. This value is used in the limit query parameter to control batch size.
Request - Describes the API request that will be sent during enumeration.
- Response type - Specifies the expected response format. Here, the system expects a JSON response.
- Method - The HTTP method to use for this request. GET is used to retrieve data from the server.
- URL - ${parameters.domain} is a placeholder variable that will be replaced by the domain value you entered in the Parameters section.

Query params - These are query string parameters appended to the URL.

${pagination.offset}controls where to start in the dataset. Used for pagination.
${pagination.limit}replaced with the limit value you entered for number of records to retrieve per request (100).
Filters data to only return alerts created within a specific time window. ${temporalWindow.from} and ${temporalWindow.to} are dynamically filled in with RFC3339 or epoch timestamps, depending what you have configured.

output - Describes how to extract and interpret the results from the JSON response.

select - .resourcesLooks for a field named resources in the response JSON. This is where the array of items lives.
map - .Each item under .resources is returned as-is. No transformation or remapping.
outputMode - collectionThe result is treated as a collection (array) of individual items. Used when you expect multiple items and want to pass them along for further processing.

Collection phase

The collection phase in an HTTP Puller is the part of the process where the system actively pulls or retrieves data from an external API using HTTP requests.

The collection phase is mandatory. This is where the final data retrieval happens (either directly or using IDs/resources generated by an enumeration phase).

The collection phase involves gathering actual data from an API after the enumeration phase has mapped out endpoints, parameters, and authentication methods. It supports dynamic variable resolution via the variable resolver and can use data exported from the Enumeration Phase, such as:

${parameters.xxx}
${secrets.xxx}
${temporalWindow.xxx}
${inputs.xxx} (from Enumeration Phase)
${pagination.xxx}*

Inputs

In collection phases, you can define variables to be used elsewhere in the configuration (for example, in URLs, query parameters, or request bodies). Each variable definition has the following fields:

Parameter

Description

Name

The variable name (used later as ${inputs.name} in the configuration).

Source

Usually "input", indicating the value comes from the enumeration phase’s output.

Expression

A JSON expression applied to the input to extract or transform the needed value.

Format

Controls how the variable is converted to a string (see Variable Formatting below). Eg: json.

Retry

Toggle ON to allow for retries and to configure the specifics.

Parameter

Description

Retry Type*

Fixed - Retries the failed operation after a constant, fixed interval every time e.g. the same amount of time between each retry attempt
- Interval* - enter the amount of time to wait e.g. 5s.
Exponential - Retries the failed operation after increasingly longer intervals to avoid overwhelming the service. The delay grows with each retry attempt.
- Initial delay* - The starting delay before the first retry attempt to ensure there’s at least some delay before retrying to avoid immediate re-hits. For example, an initial delay of 2s equals a retry pattern of 2s, 4s, 8s, 16s, etc.
- Maximum delay* - The maximum wait time allowed between retries to prevent the retry delay from growing indefinitely. For example, an initial delay of 2s and a maximum delay of 10s equals a delay progression of 2s, 4s, 8s, 10s, 10s, etc.
- Increasing factor* - The multiplier used to calculate the next delay interval, determining how quickly the delay grows after each failed attempt.

Retry after response header

Used to define how long to wait before making another request e.g. HTTP 429 Too Many Requests or HTTP 503 Service Unavailable.

Header - Follow the header syntax for the header.
Format - The format for the header syntax (Seconds, Epoch, Epoch Timestamp, RFC1123, RFC1123Z, RFC3339).
- e.g. wait 120 seconds Retry-After: 120
- e.g. epoch timestamp Retry-After: Wed, 21 Oct 2025 07:28:00 GMT

Throttling

Use throttling to intentionally limit the rate at which the HTTP requests are sent to the API or service.

Throttling Type*

The client itself controls and limits the rate at which it sends requests.

Parameter

Description

Client type*

How to manage the rate of requests.

Rate - the client is restricted by the data transfer rate or request rate over time.
- Maximum requests* - The maximum number of requests (or amount of data) to make within a specified time interval.
- Call interval* - The sliding or fixed window of time used to calculate the rate.
- Number of burst requests* - the number of requests that can exceed the normal rate temporarily before throttling kicks in to allow short bursts of traffic over the limit to accommodate sudden spikes without immediate blocking. e.g. if the max rate is 10 requests/sec, and burst is 5, the client could make up to 15 requests instantly, but then throttling will slow down after the burst.
Fixed delay - The server enforces a fixed wait time after each request before allowing the client to make the next request. Instead of limiting by rate (requests per second) or volume, it just inserts a pause/delay between requests.
- Call interval* - The sliding or fixed window of time used to calculate the delay.

Parameter

Description

Pagination Type*

Choose how the API organizes and delivers large sets of data across multiple pages—and how that affects the process of systematically collecting or extracting all available records.

Output

Parameter

Description

Select*

Filter

A JSON expression to filter the selected elements. Example: '.films | index("Tangled")'.

Map

A JSON expression to transform each selected element into a new event. Example: '{characterName: .name}'.

Output Mode*

Choose between

Element: emits each transformed element individually as an event.
Collection: emits all transformed items as a single array/collection as an event.

Collection example

Let´s say you have the following SIEM Integration events from Sophos.

collectionPhase:
  paginationType: cursor
  cursorSelector: ".next_cursor"
  initialRequest:
    method: GET
    url: "${inputs.dataRegionURL}/siem/v1/events"
    headers:
      - name: Accept
        value: application/json
      - name: Accept-Encoding
        value: gzip, deflate
      - name: X-Tenant-ID
        value: "${inputs.tenantId}"
    queryParams:
      - name: from_date
        value: "${temporalWindow.from}"
    bodyParams: []
  nextRequest:
    method: GET
    url: "${inputs.dataRegionURL}/siem/v1/events"
    headers:
      - name: Accept
        value: application/json
    queryParams:
      - name: cursor
        value: "${pagination.cursor}"
    bodyParams: []
  output:
    select: ".result"
    filter: "."
    map: "."
    outputMode: element

Pagination type - cursor. If you select the cursor type, you retrieve the data in chunks (pages) using a cursor token, which points to the position in the dataset where the next page of results should start.
- Cursor selector - The cursor selector tells the HTTP Puller where to find the cursor value in the API response so it can be saved and used in the next request e.g. .next_cursor
Initial request - We fetch the first set of results, the response including the cursor token (e.g. timestamp or ID).
- method - GET to fetch the results.
- url - The URL is composed of various elements:
  - https://${inputs.dataRegionURL}- these variables are taken from the values you entered in the Parameters section of the HTTP Pull settings.
  - /siem/v1/ -API base path — indicates you're calling version 1 of the SIEM API.
  - events- indicates the specific endpoint being accessed. events general category of the API (event-related).
headers - these headers are key-value pairs that provide additional information to the server when making a request.
- name - Accept
- value - application/json tells the server that the client expects the response to be in JSON format, a standard HTTP header used for content negotiation.
Next request - send the cursor token back to the server using a parameter (e.g., ?cursor=abc123) to get the next page of results. The server returns the next chunk of data and a new cursor.
Repeat until no more data or the server returns a has_more: false flag.method
Output
- select - .result Selects the part of the response to extract. This is a JSONPath-like expression that tells the puller where to find the list or array of items in the response.
- map - . Maps each selected item as-is, keeping each object unchanged. It passes through each item without transforming it. If you needed to restructure or extract specific fields from each item, you would replace . with a field mapping (e.g., .id, { "id": .id, "name": .username }, etc.).
- output mode - element Controls the output format. Each item from the select result will be emitted individually using element. This is useful for event stream processing, where each object (e.g., an alert or event) is treated as a separate record. Other possible values (depending on the platform) might include array (emit as a batch) or raw (emit as-is).

Ports

The HTTP Pull Listener has two output ports:

Default port - Events are sent through this port if no error occurs while processing them.
Error port - Events are sent through this port if an error occurs while processing them.

The error message is provided in a free-text format and may change over time. Please consider this if performing any post-processing based on the message content.

Examples

1. Basic GET Puller

Here's a simple example of using the HTTP Puller collector with parameters for a basic GET request. No authentication, no pagination, just pulling JSON data from an API endpoint. Keep Config as YAML, Temporal window, Authentication and Enumeration phase as OFF.

Collection phase
- Pagination type - none Indicates that you only need one request to retrieve all data at once.
  - Repeat until: No repeat to ignore, or No data to repeat the request until no data is returned.
- Request
  - Response type - jsonTells the puller to expect a JSON response.
  - Method: GET Performs a basic HTTP GET request.
  - URL: Constructed from the parameters.domain and parameters.path https://{{parameters.domain}}{{parameters.path}}
- Headers: Set standard headers and include the API key.
- Output:
  - Select:.logs Tells the system where to find the list of log entries in the response.
  - Output mode: element each object inside .logs will be extracted as a separate output element e.g.
    { "logs": [ { "timestamp": "2024-12-01T12:00:00Z", "event": "user_login" }, { "timestamp": "2024-12-01T12:05:00Z", "event": "file_upload" } ] }

2. Make an HTTP request using offset and limit pagination

Instead of displaying the results in a scrollable list, we will use offset/limit pagination to fetch data in pages.

Pagination type - offset/Limit We control how many records are returned at a time (limit) and choose where to start each request (offset or skip parameter)
Zero Index - false
Limit* - 50
Request - The request to be repeated, with offset and limit automatically incremented per iteration.
Response type* - Json
Method* - GET
URL* - https://example.com/items
Query params The API supports pagination through query parameters:
- Name - skip
- Value - ${pagination.offset}" the number of records to skip before returning results
- Name - limit
- Value - ${pagination.limit} uses the limit entered (50) as the maximum number of records to return in one request.

collectionPhase
  paginationType:
  "offsetLimit"
    limit: 50
    isZeroIndex: false
    request:
      method: "GET"
      url: "https://example.com/items"
      queryParams:
        - name: skip
          value: "${pagination.offset}"
          name: limit
          value: "${pagination.limit}"

3. Enumeration + Collection with `responseBodyLink`

This example defines a data extraction workflow that

Enumerates through a paginated API endpoint using responseBodyLink.
Filters and transforms specific data from the paginated results.
Collects further data based on the enumerated output using individual requests.

It also uses a temporal window to scope or schedule the data extraction process.

# Temporal window (optional)
# Generated variables: $temporalWindow.from, $temporalWindow.to
temporalWindow:
  duration: 5m
  offset: 10m
  tz: UTC
  format: RFC3339
enumerationPhase:
  paginationType: 
    responseBodyLink:
      nextLinkSelector: ".info.nextPage"
      request:
        method: "GET"
        url: "https://api.cyberintel.dev/iocs"
        headers:
        - name: accept
          value: "application/json"
        bodyExpression:
          expression: "(.data | length) == 50"
  output:
    select: '.data'
    filter: '.threatType == "Ransomware"'
    map: '._id'
    outputMode: "element"
collectionPhase:
  variables:
    - name: id
      source: input
      expression: "."
    paginationType: none
     repeatUntilNoData: true 
      request:
        method: "GET"
        url: "https://api.cyberintel.dev/iocs/${id}"
        headers:
       -  name: accept
          value: "application/json"
  output:
    select: ".data"
    filter: ""
    map: "{iocName: .name}"
    outputMode: "element"

Enumeration

The enumeration defines how to gather data in a paginated manner from the Cyber Threat Intelligence API using the responseBodyLink pagination strategy.

Pagination Type - The type is Next Link At Response Body
Selector - The next page link is found using the JSON path ".info.nextPage" This suggests that the response will contain a field info.nextPage with the URL of the next page of results.

For example, the response might look like:

{
  "info": {
    "nextPage": "https://api.cyberintel.dev/iocs?page=2"
  },
  "data": [ ... ]
}

Response type - JSON
Method - GET. The HTTP method is GET to fetch the data.
URL - The initial URL for the request is "https://api.cyberintel.dev/iocs", where the IOCs are listed.
headers - The Accept header specifies that the response should be in JSON format.

Output

Select - The .data array from the response is selected for further processing. This array contains the actual IOC data.
Filter - The filter expression '.threatType == "Ransomware"' selects only those IOCs where the threatType is "Ransomware". This is how we focus on ransomware-related indicators.
Map - The map expression '._id' extracts the ._id field from each IOC that passed the filter. This results in a list of IOC IDs that match the ransomware threat type.
Output Mode - element indicates that each IOC ID (element) is treated as an individual item, rather than as a group or array.

Result: After processing the pages, we will have a list of ransomware IOC IDs.

Collection

Once the enumeration process gathers a list of IOC IDs related to ransomware, the collection section is responsible for retrieving more detailed information for each of those IOCs.

variables - This section defines variables used in the collection step.

Name - id: The variable id represents each individual IOC ID from the enumeration output.
Source - The source: input means that the IDs come from the output of the previous enumeration step.
Expression - expression: "." simply takes each item from the input (the IOC IDs).

HTTP Request for Detailed IOC Information

Pagination type: The type is "none", indicating no additional processing is needed before making the request.
Response type - JSON.
Method: The HTTP method is GET, to fetch detailed information about each IOC.
Url: The URL for each IOC is dynamic, with the IOC ID substituted in the URL (${id}). For example, if id = "a1b2", the URL would be https://api.cyberintel.dev/iocs/a1b2.
Headers: The Accept: "application/json" header ensures the response is in JSON format.

Output Selection and Mapping

Select: This selects the .data field from the response, which contains the detailed information for the IOC.
Filter: No additional filtering is applied.
Map: The map expression "{iocName: .name}" creates a new object with the iocName key, mapping it to the .name of the IOC from the response.
Output Mode: outputMode: "element" means each IOC’s name will be treated as an individual output item.

Result: Each IOC name (or other information, if mapped) will be saved to a file.

4. Enumeration (collection output) + Collection (POST with `bodyRaw`)

Temporal window defines a 5-minute slice of time, offset 10 minutes ago.

Enumeration step:

Makes a paginated GET to /posts.
Extracts IDs from posts within the time window.
Produces a collection of IDs.

Collection step:

Uses those IDs in a POST request.
Filters, maps, and outputs enriched objects (id, title, status).
Saves results to a file.

# Temporal window (optional)
temporalWindow:
  duration: 5m
  offset: 10m
  tz: UTC
  format: RFC3339
enumerationPhase:
  httpRequest:
    type: "page"
    page:
      pageSize: 50
      request:
        method: "GET"
        url: "https://api.fake-rest.refine.dev/posts"
        headers:
          Accept: "application/json"
        queryParams:
          from: "${temporalWindow.from}"
          to: "${temporalWindow.to}"
          _page: "${pagination.pageNumber}"
          _per_page: "${pagination.pageSize}"
  output:
    select: '.'
    # filter: '.language == 3'
    map: '{id: .id}'
    outputMode: "collection"
collectionPhase:
  variables:
    - name: ids
      source: input
      expression: "."
      format: "json"
  httpRequest:
    type: "none"
    none:
      request:
        method: "POST"
        url: "https://api.fake-rest.refine.dev/posts"
        headers:
          Accept: "application/json"
        bodyType: "raw"
        bodyRaw: |
          {
            "ids": ${inputs.ids}
          }
  output:
    select: "."
    filter: ".id > 10"
    map: "{id: .id, title: .title, status: .status}"
    outputMode: "element"

Duration - 5m window size is 5 minutes.
Offset - 10m shifts the window back 10 minutes from “now”. So if current UTC is 12:00, the range would be 11:45 – 11:50.
Time zone - UTC
Format - RFC3339 output format for timestamps (e.g., 2025-08-20T12:00:00Z).

The variables ${temporalWindow.from} and ${temporalWindow.to} get auto-populated with these calculated times.

Enumeration

Pagination type - page number/page size
Page size: 50 fetch 50 records per request.
Request
- Response type - JSON
- Method - GET
- URL - https://api.fake-rest.refine.dev/posts
- Query Params
  1.From: "${temporalWindow.from}"
  - Inserts the start timestamp of the time window. ${temporalWindow.from} is automatically computed based on your temporalWindow configuration e.g. If now = 12:00 UTC, offset = 10m, and duration = 5m = temporalWindow.from = 11:45 UTC (start) In the request, this becomes something like:
  ?from=2025-08-20T11:45:00Z
  2. to: "${temporalWindow.to}" Inserts the end timestamp of the time window e.g.
  temporalWindow.to = 11:50 UTC (end). In the request, this becomes:
  &to=2025-08-20T11:50:00Z
  So together, from and to tell the API:
  “Only give me records between 11:45 and 11:50 UTC.”
  3. _page: "${pagination.pageNumber}" This is a built-in pagination variable.
  ${pagination.pageNumber} auto-increments as the system makes repeated requests to fetch all pages e.g. First request _page=1 Second request _page=2 etc.
  This ensures you don’t just get the first batch, but all results page by page.
  4. _per_page: "${pagination.pageSize}” Controls how many records to fetch per page.
  This pulls from your earlier configuration
  page: pageSize: 50
  So each request includes: &_per_page=50
  &_per_page=50
Select - '.'selects the entire JSON response.
Filter - would filter only records where .language == 3.
Map - extracts only {id: .id} for each record.
Output Mode - collection outputs an array of items (instead of single elements).

[
  {"id": 1},
  {"id": 2},
  {"id": 3}
]

Collection (POST with BodyRaw)

Pagination Type - Next link at response body
Selector - "." take the full collection.
Response Type - json keep it as JSON (array of IDs).
Method - POST to send data.
URL - https://api.fake-rest.refine.dev/posts
Body Type: raw freeform JSON payload.
Body Content - sends the IDs collected in the enumeration: ids": ${inputs.ids}
Select: "." take the full response.
Filter - ".id > 10" only keep posts with ID greater than 10.
Map - reduce each record to {id, title, status}.
Output Mode - element output individual objects, one at a time.

PreviousCollect data from Zscaler NextCollect data from Abnormal Security Client API

Last updated 26 days ago

Was this helpful?

hashtagOverview

hashtagPrerequisites

hashtagHTTP Pull configuration

hashtagDesconstructing a YAML

hashtagTemporal window

hashtagAuthentication phase

hashtagAuthentication credentials

hashtagExample

hashtagExample: Authenticate HTTP requests to Microsoft Azure using the HMAC-SHA256 scheme.

hashtagExample 2: API HMAC Authentication for Oracle

hashtagRetry

hashtagRetry on errors processing body

hashtagThrottling

hashtagEnumeration phase

hashtagCollection phase

hashtagRetry

hashtagThrottling

hashtagOutput

hashtagPorts

hashtagExamples

hashtag1. Basic GET Puller

hashtag2. Make an HTTP request using offset and limit pagination

hashtag3. Enumeration + Collection with responseBodyLink

hashtagEnumeration

hashtag4. Enumeration (collection output) + Collection (POST with bodyRaw)

Overview

Prerequisites

HTTP Pull configuration

Desconstructing a YAML

Temporal window

Authentication phase

Authentication credentials

Example

Example: Authenticate HTTP requests to Microsoft Azure using the HMAC-SHA256 scheme.

Example 2: API HMAC Authentication for Oracle

Retry

Retry on errors processing body

Throttling

Enumeration phase

Collection phase

Retry

Throttling

Output

Ports

Examples

1. Basic GET Puller

2. Make an HTTP request using offset and limit pagination

3. Enumeration + Collection with `responseBodyLink`

Enumeration

4. Enumeration (collection output) + Collection (POST with `bodyRaw`)