Using HTTP Pull as a collector

HTTP Puller integrations are parametrized and organized by vendor, categorized by product/API.

Inside each endpoint, you will find a yaml configuration. This configuration is used in the Onum HTTP Puller action in order to start feeding that information into the platform. Check the articles under this section to learn more about configurations specific to each vendor.

Desconstructing a YAML

Learn how the YAML parameters correspond to the settings in the HTTP Pull Listener.

Temporal window

A temporal window is a defined time range used to filter or limit data retrieval in queries or API requests. It specifies the start and end time for the data you want to collect or analyze. This YAML uses a temporal window of 5 minutes, in RFC3339 format, with an offset of 0, in UTC timezone.

withTemporalWindow: true
temporalWindow:
  duration: 5m
  offset: 0
  tz: UTC
  format: RFC3339

In Onum, toggle ON the Temporal Window selector and enter the information in the corresponding fields

  • Duration* - 5m

  • Offset* - 0s

  • TZ* - this will set automatically according to your current timezone.

  • Format* - RFC3339

Authentication phase

Learn about the parameters to enable authentication in your HTTP Puller.

withAuthentication: true
authentication:
  type: token
  token:
    request:
      method: POST
      url: ${parameters.domain}/oauth2/token
      headers:
        - name: Content-Type
          value: application/x-www-form-urlencoded
      bodyType: urlEncoded
      bodyParams:
        - name: grant_type
          value: client_credentials
        - name: client_id
          value: '${secrets.client_id}'
        - name: client_secret
          value: '${secrets.client_secret}'
    tokenPath: ".access_token"
    authInjection:
      in: header
      name: Authorization
      prefix: 'Bearer '
      suffix: ''

Toggle ON the Authentication option.

  • Type - Token. Token authentication is a method of authenticating API requests by using a secure token, usually passed in an HTTP header.

  • Request

    • method - POSTSends a POST request to obtain an access token.

    • url - ${parameters.domain}/oauth2/tokenThe OAuth token endpoint. ${parameters.domain} is a placeholder for value entered in the Parameters section.

    • headers - these headers are key-value pairs that provide additional information to the server when making a request.

      • name - Content-Type

      • value - application/x-www-form-urlencodedIndicates that the request body is formatted as URL-encoded key-value pairs (standard for OAuth token requests).

    • Body type -urlEncoded Specifies the request body format is URL-encoded (like key=value&key2=value2).

      • Body params

        • name - grant_type Required by OAuth 2.0 to specify the type of grant being requested.

        • value - client_credentials Used for server-to-server authentication without a user.

        • name - client_ID

        • value - ${secrets.client_id}this is a dynamic variable pulled from the value entered in the Secrets setting.

        • name - client_secret

        • value - ${secrets.client_secret} this is a dynamic variable pulled from the value entered in the Secrets setting.

    • Token path - Extracts the access token from the JSON response of an authentication request. It's a JSONPath-like expression used to locate the token in the response body.

  • Auth injection - This part defines how and where to inject the authentication token (typically an access token) into the requests after it has been retrieved, for example, from an OAuth token endpoint.

    • in -headerThe token should be injected into the HTTP header of the request.This is the most common method for passing authentication tokens.

    • Name -AuthorizationThe name of the header that will contain the token. Most APIs expect this to be Authorization.

    • prefix - The text added before the token value.Bearer is the standard prefix for OAuth 2.0 tokens.

    • suffix -''Text added after the token value. In this case, it's empty — nothing is appended.

Enumeration phase

The enumeration phase is an optional step in data collection or API integration workflows, where the system first retrieves a list of available items (IDs, resource names, keys, etc.) before fetching detailed data about each one.

enumerationPhase:
  paginationType: offsetLimit
  limit: 100
  request:
    responseType: json
    method: GET
    url: ${parameters.domain}/alerts/queries/alerts/v2
    queryParams:
      - name: offset
        value: ${pagination.offset}
      - name: limit
        value: ${pagination.limit}
      - name: filter
        value: created_timestamp:>'${temporalWindow.from}'+created_timestamp:<'${temporalWindow.to}'
  output:
    select: ".resources"
    map: "."
    outputMode: collection
  • Pagination type - offset/LimitUses classic pagination with offset and limit to page through results, fetching data in batches (pages) — limit determines page size, offset determines where to start.

  • Limit - Retrieves up to 100 records per request. This value is used in the limit query parameter to control batch size.

  • Request - Describes the API request that will be sent during enumeration.

    • Response type - Specifies the expected response format. Here, the system expects a JSON response.

    • Method - The HTTP method to use for this request. GET is used to retrieve data from the server.

    • URL - ${parameters.domain} is a placeholder variable that will be replaced by the domain value you entered in the Parameters section.

  • Query params - These are query string parameters appended to the URL.

    • ${pagination.offset}controls where to start in the dataset. Used for pagination.

    • ${pagination.limit}replaced with the limit value you entered for number of records to retrieve per request (100).

    • Filters data to only return alerts created within a specific time window. ${temporalWindow.from} and ${temporalWindow.to} are dynamically filled in with RFC3339 or epoch timestamps, depending what you have configured.

  • output - Describes how to extract and interpret the results from the JSON response.

    • select - .resourcesLooks for a field named resources in the response JSON. This is where the array of items lives.

    • map - .Each item under .resources is returned as-is. No transformation or remapping.

    • outputMode - collectionThe result is treated as a collection (array) of individual items. Used when you expect multiple items and want to pass them along for further processing.

Collection phase

The collection phase in an HTTP Puller is the part of the process where the system actively pulls or retrieves data from an external API using HTTP requests.

Let´s say you have the following events API YAML from Netskope.

collectionPhase:
  paginationType: "cursor"
  cursor: ".timestamp_hwm"
  initialRequest:
    method: GET
    url: "https://${parameters.domain}/api/v2/events/dataexport/events/alert?index=${parameters.netskopeIndex}&operation=${temporalWindow.from}"
    headers:
      - name: Accept
        value: application/json
      - name: Netskope-Api-Token
        value: "${secrets.netskopeApiToken}"
  nextRequest:
    method: GET
    url: "https://${parameters.domain}/api/v2/events/dataexport/events/alert?operation=next&index=${parameters.netskopeIndex}"
    headers:
      - name: Accept
        value: application/json
      - name: Netskope-Api-Token
        value: "${secrets.netskopeApiToken}"
  output:
    select: ".result"
    map: "."
    outputMode: element 
  • Pagination type - cursor. If you select the cursor type, you retrieve the data in chunks (pages) using a cursor token, which points to the position in the dataset where the next page of results should start.

    • Cursor selector - The cursor selector tells the HTTP Puller where to find the cursor value in the API response so it can be saved and used in the next request e.g. .timestamp_hwm

  • Initial request - We fetch the first set of results, the response including the cursor token (e.g. timestamp or ID).

    • method - GET to fetch the results.

    • url - The URL is composed of various elements:

      • https://${parameters.domain}- these variables are taken from the values you entered in the Parameters section of the HTTP Pull settings.

      • api/v2/-A PI base path — indicates you're calling version 2 of the Netskope API.

      • events/dataexport/events/alert- indicates the specific endpoint being accessed. events general category of the API (event-related), dataexportindicates You're using the Data Export framework for pulling logs, eventsrefers to log type (this part can vary depending on the type of logs you're pulling), alerts specifies the subtype of events is alert events.

      • ?index= - query parameter that provides a unique identifier (index) for the current data collection context. Netskope uses this index to track where a particular consumer left off.

      • ${parameters.netskopeIndex}this dynamic variable is taken from the values you entered in the Parameters section of the HTTP Pull settings.

      • &operation={temporalWindow.from}- a second query parameter, specifying the operation type. The temportal window will be dynamically replaced with the start timestamp of your time window (e.g., RFC3339). This is used for the initial request to indicate where data collection should begin.

  • headers - these headers are key-value pairs that provide additional information to the server when making a request.

    • name - Accept

    • value - application/json tells the server that the client expects the response to be in JSON format, a standard HTTP header used for content negotiation.

    • name - Netskope-Api-Token

    • value -${secrets.netskopeApiToken} This is the authentication token used to authorize your request, required by Netskope API. It is a placeholder variable that will be filled in with the values entered in the Secrets parameter.

  • Next request - send the cursor token back to the server using a parameter (e.g., ?cursor=abc123) to get the next page of results. The server returns the next chunk of data and a new cursor.

    Repeat until no more data or the server returns a has_more: false flag.method

  • Output

    • select - .result Selects the part of the response to extract. This is a JSONPath-like expression that tells the puller where to find the list or array of items in the response.

    • map - . Maps each selected item as-is, keeping each object unchanged. It passes through each item without transforming it. If you needed to restructure or extract specific fields from each item, you would replace . with a field mapping (e.g., .id, { "id": .id, "name": .username }, etc.).

    • output mode - element Controls the output format. Each item from the select result will be emitted individually using element. This is useful for event stream processing, where each object (e.g., an alert or event) is treated as a separate record. Other possible values (depending on the platform) might include array (emit as a batch) or raw (emit as-is).

Last updated

Was this helpful?