PCL (Parser Configuration Language)

Introduction

PCL is a language designed to extract data from a line of text by describing its structure. The language aims to use a concise and intuitive syntax to help visualize the structure of a line of text.

PCL expressions are used to configure the Parser Action.

Syntax basics

A valid PCL expression must be composed of one or more fields, as long as there are separators between them. That is, it must follow this rule:

delimiter? fixedLength* field(delimiter fixedLength* field)* delimiter?

Where a delimiter could be a literal or an operator. This last one could optionally have surrounding literals.

When using groups, the PCL behaviour can change, as groups are a special type of field that can be written next to other fields without a delimiter. Check the Group section below to learn more about this.

circle-exclamation

Valid example

{myFieldOne:string} {myFieldTwo:int}<while(value=" ")>{myCsv:csv(fields=[0,2],separator=",")} 

Invalid example (no delimiters)

{myFieldOne:string}{myFieldTwo:int}

The grammar supports any kind of name that is written with the set of characters A-Z, a-z, 0-9 and the symbol underscore (_). It supports field aliases with any name written with the set of characters A-Z, a-z, 0-9, _, -, # and . (given that the first character is not _).

Syntax fields

In PCL, we can write any sequence of fields. The type of fields can be the following:

circle-info

Learn more about each field option in the Field options section below.

chevron-rightCSVhashtag

csv is a configurable field. The available parameters are:

Parameter
Description

alias (optional)

Rename the field name.

indices (optional)

Select which columns you want to extract from the CSV.

separator (optional)

Define the separator of the columns.

totalColumns (optional)

Indicate the number of columns of your CSV.

circle-exclamation

Examples

chevron-rightFloathashtag

float is a configurable field. The available parameters are:

Parameter
Description

alias (optional)

Rename the field name.

decimalSeparator (optional)

Define the separator to the decimal.

thousandSeparator (optional)

Define the separator to the thousands.

circle-exclamation

Example

chevron-rightGrouphashtag

A group is a special type that might contain two or more of the following simple types:

  • Float

  • Integer

  • Separator

  • String

Groups cannot be used inside other groups. The available parameters are:

Parameter
Description

optional (optional)

Everything inside the group marked with this option could or not be in the log to parse.

Optionally, groups can have their type defined.

Examples

The concatenation of simple types inside a group behaves in the same way as a normal PCL, always taking into account the restrictions of which types can be used.

For a PCL containing a group field to be considered valid, the concatenation of fields/delimiters surrounding the group and the inner content of the group must form a valid PCL. Therefore, when unwrapping the inner PCL and joining it with the outside PCL, it must be valid. This means groups don't need to be separated from other fields by delimiters, as long as the resulting PCL is valid. This is because delimiters can be found at the start or end of a group.

For example, given the following valid PCL:

It could be a valid PCL as the concatenation will result in the following:

However, the following PCL would be considered invalid:

As the result of the concatenation will result in two fields being together:

This also applies when using the optional operator. The different PCLs generated containing or not the optional group must be valid. A valid example would be:

All the possible PCLs have their fields correctly separated by delimiters. On the other hand, if we had a PCL like the following, it would be considered invalid:

As one of the possible PCLs would be:

There are two fields together, which is not considered valid.

chevron-rightIntegerhashtag

int is a configurable field. The available parameters are:

Parameter
Description

alias (optional)

Rename the field name.

thousandSeparator (optional)

Define the separator to the thousands.

Example

chevron-rightJSONhashtag

json is a configurable field. The available parameters are:

Parameter
Description

alias (optional)

Rename the field name.

fields (optional)

Select which items you want to extract from the JSON.

Examples

chevron-rightKey-value listhashtag

keyValueList is a configurable field. The available parameters are:

Parameter
Description

alias (optional)

Rename the field name.

kvSeparator (optional)

Define the separator between keys and values.

listSeparator (optional)

Define the separator between each key-value item.

indices (optional)

Select which columns you want to extract from the list by their position in the list.

fields (optional)

Select which items you want to extract from the list by their key names.

circle-exclamation

Examples

circle-info

if you need to parse a list of key-values where there are some keys duplicated, check the example in this section.

chevron-rightStringhashtag

string is a configurable field. The available parameters are:

Parameter
Description

alias (optional)

Rename the field name.

length (optional)

Define the length of the string.

escapableChar (optional)

Escape delimiter characters in the string.

Examples

chevron-rightXMLhashtag

XML is a configurable field. The available parameters are:

Parameter
Description

alias (optional)

Rename the field name.

xpaths (optional)

Select which items you want to extract from the XML using a subset of the XPatharrow-up-right query language.

Examples

There's also some subtypes:

chevron-rightBooleanhashtag

bool is a configurable field. The available parameters are:

Parameter
Description

alias (optional)

Rename the field name.

default (optional)

Default value to be used when the element is not found.

Example

chevron-rightBoolean listhashtag

listBool is a configurable field. The available parameters are:

Parameter
Description

alias (optional)

Rename the field name.

default (optional)

Default value to be used when the element is not found.

Example

chevron-rightInteger listhashtag

listInt is a configurable field. The available parameters are:

Parameter
Description

alias (optional)

Rename the field name.

default (optional)

Default value to be used when the element is not found.

Example

chevron-rightFloat listhashtag

listFloat is a configurable field. The available parameters are:

Parameter
Description

alias (optional)

Rename the field name.

default (optional)

Default value to be used when the element is not found.

Example

chevron-rightString listhashtag

listString is a configurable field. The available parameters are:

Parameter
Description

alias (optional)

Rename the field name.

default (optional)

Default value to be used when the element is not found.

Example

Special case: fixed-length strings

There is a special case with String fields: if the field is using the option length, you may add another field very next to it with no separators:

chevron-rightMaphashtag
circle-exclamation

The available parameters are:

Parameter
Description

alias (optional)

Rename the field name.

default (optional)

Default value to be used when the element is not found.

key (mandatory)

XPath from where to extract the key for the map. This must be an XPath relative to the parent element selected by the XPath you're configuring.

outputFormat (mandatory)

Format of the map to be stored in the event.

value (optional)

XPath from where to extract the value for the map. This must be an XPath relative to the parent element selected by the configured you're configuring. By default, the value is the text inside the element selected in your XPath.

Syntax literals

A literal is a special type of element. This is a string that must exist between two fields. Unlike other fields, the literals are just a string that may contain one or more characters except <, >, {, or }, unless they are escaped with \

This is an example of a literal (whitespace ):

Syntax operators

There are two types of operators:

chevron-rightSkiphashtag

The Skip operator acts like a dynamic separator. It can be used when we want to skip any content until we find a coincidence.

This is a configurable operator that is equivalent to the regular expression (?:from)*(?=to) where from and to are the strings to match.

The available parameters are:

Parameter
Description

from (mandatory)

Define the string to find one or more times.

to (mandatory)

Define the string to insert one or more times.

Example

A use case could be to skip all characters until a JSON is found. For example, for this log:

We could use this PCL:

chevron-rightWhilehashtag

The While operator acts like a dynamic separator. It is useful if a separator has an unknown number of repetitions on each log.

This is a configurable operator that is equivalent to the regular expression (?:value)* where value is the string to match. However, if the options min and/or max are defined, then the equivalent regular expression is (?:value)*{- min,max}

The available parameters are:

Parameter
Description

value (mandatory)

Define the string to find one or more times.

max (optional)

Set the maximum number of repetitions of value (must be greater than 0).

min (optional)

Set the minimum number of repetitions of value (must be greater than 0).

It is not necessary to define both max and min. However, if both are defined, then it must assert that min is strictly lower than max, that is, min < max.

In another example, let - as a separator that appears at least 3 times in all logs:

  • hello - - -world

  • goodbye - - - - -world

  • hello - - - -moon

Then, the PCL could be:

Field options

chevron-rightaliashashtag

The value must follow the naming requirements:

  • The allowed set of characters is: A-Z, a-z, 0-9, ., -, _ or #.

  • An alias cannot start with _.

These are valid examples:

  • alias="myNewName"

  • alias="my-new-name"

These are invalid examples:

  • alias="_myNewName"

  • alias="my new name"

chevron-rightdecimalSeparatorhashtag

The value could be:

  • ,

  • . (default value)

These are valid examples:

  • decimalSeparator=","

  • decimalSeparator="."

These are invalid examples:

  • decimalSeparator=""

  • decimalSeparator="-"

  • decimalSeparator="_"

chevron-rightdefaulthashtag

The value must be of the same type as the parent. For example, if it is the default value of an integer, then the default value must be an integer too.

These are valid examples:

  • {myfield:json(fields=["hello":string(default="{}")])}

  • {myfield:csv(indices=[0:int(default=-1)])}

  • {myfield:xml(xpaths=["/data/event":listString(default=["one","two"],alias="myField")])}

  • {myfield:xml(xpaths=["/data/event":listInt(default=[1,2],alias="myField")])}

These are invalid examples:

  • {myfield:json(fields=["hello":string(default=-1)])}

  • {myfield:float(default=1.5)}

chevron-rightescapableCharhashtag

The value must be an ASCII char.

These are valid examples:

  • escapableChar="|"

  • escapableChar="\"

  • escapableChar=">"

These are invalid examples:

  • escapableChar=""

  • escapableChar="hello"

  • escapableChar=;

chevron-rightfieldshashtag

The value must be a list of strings. Note that the list cannot contain other values (e.g. numbers).

Additionally, we may specify the type of each field by writing a colon (:) followed by the type:

  • bool

  • float

  • int

  • string

  • listBool (only for Key-value lists)

  • listFloat (only for Key-value lists)

  • listInt (only for Key-value lists)

  • listString (only for Key-value lists)

For example: fields=["oneField":bool, "middleField", "anotherField":int]. If the type is omitted, it should be assumed that the type is string. In the previous example, it assumes that middleField is a string.

Each sub-type may have these parameters:

  • alias (optional) to rename the field name.

  • default (optional) to set a fixed value if the field does not exist in the log.

These are valid examples:

  • fields=["oneField","anotherField.with.subField"]

  • fields=["oneField":string(alias="anotherName")]

  • fields=[]

These are invalid examples:

  • fields=[oneField,anotherField]

  • fields=["oneField,anotherField"]

  • fields=[0,1]

chevron-rightindiceshashtag

The value must be a list with numbers. Note that the list cannot contain other values apart from positive integers (including zero).

Additionally, we may specify the type of each index by writting a colon (:) followed by the type: bool, float, int or string. For example: indices=[0:bool, 1, 3:int]. If the type is omitted, it assumes that the type is string. In the previous example, it assumes that 1 is a string.

Each sub-type may have these parameters:

  • alias (optional) to rename the field name.

  • default (optional) to set a fixed value if the field does not exist in the log.

These are valid examples:

  • indices=[0,1,3]

  • indices=[1:string(default="not exists")]

  • indices=[]

These are invalid examples:

  • indices=["0","1"]

  • indices=[-3,1]

chevron-rightkeyhashtag

XPath to extract the keys for a map. This XPath is relative to the parent node and supports the same subset of XPath described in the XPaths section. Note that the value extracted for the keys is always a string.

These are valid examples:

  • key="/node"

  • key="/@attribute"

These are invalid examples:

  • key="//"

  • key=""

  • key="/node":float

chevron-rightkvSeparatorhashtag

The value can be as long as needed, there is no character limit. By default, it is =.

These are valid examples:

  • kvSeparator=":"

  • kvSeparator="\t"

  • kvSeparator="hello"

These are invalid examples:

  • kvSeparator=""

  • kvSeparator=:

Note that " must be escaped. For example: kvSeparator="\""

chevron-rightlengthhashtag

The value must be a strictly positive integer.

These are valid examples:

  • length=1

  • length=25

These are invalid examples:

  • length="1"

  • length=0

  • length=-3

chevron-rightlistSeparatorhashtag

The value must be a non-empty text. By default, it is ,

These are valid examples:

  • listSeparator=";"

  • listSeparator="|"

  • listSeparator="hello"

These are invalid examples:

  • listSeparator=""

  • listSeparator=;

Note that " must be escaped. For example: listSeparator="\"".

chevron-rightoptionalFieldhashtag

The value must be a boolean. Therefore, it could be:

  • true

  • false

These are the valid options:

  • optional=true

  • optional=false

These are invalid options:

  • optional=john

  • optional=doe

Currently, this option is only available for group fields.

chevron-rightoutputFormathashtag

Sets the output format with which a map is stored in the event. At the moment, the only supported value for this configuration parameter is json.

circle-exclamation

These are valid examples:

  • outputFormat="json"

These are invalid examples:

  • outputFormat=""

  • outputFormat="xml"

  • outputFormat="something"

chevron-rightseparatorhashtag

The value must be a character from the set: |, ;, ,, \t. By default, it is ,.

These are valid examples:

  • separator=";"

  • separator="\t"

These are invalid examples:

  • separator="-"

  • separator=;

chevron-rightthousandSeparatorhashtag

The value could be:

  • empty string (default value).

  • ,

  • .

These are valid examples:

  • thousandSeparator=""

  • thousandSeparator="."

These are invalid examples:

  • thousandSeparator="-"

  • thousandSeparator="_"

chevron-righttotalColumnshashtag

The value must be a strictly positive integer.

These are valid examples:

  • totalColumns=1

  • totalColumns=5

A valid value for this option must equal the number of columns in the CSV.

These are invalid examples:

  • totalColumns="1"

  • totalColumns=0

  • totalColumns=-3

chevron-rightvaluehashtag

XPath to extract the values for a map. This XPath is relative to the parent node and supports the same subset of XPath described in the XPaths section.

These are valid examples:

  • value="/node"

  • value="/@attribute"

These are invalid examples:

  • value="//"

  • value=""

chevron-rightxpathshashtag

This value must be a list of valid XPath strings with some limitations. PCL supports a subset of XPath expressions: only direct paths are supported. Things like predicates, wildcards and // selectors aren't supported.

Supported ✅

Unsupported ❌

Additionally, we may specify the type of each field by writing a colon (:) followed by a type. Currently, supported types for XPath are:

  • bool

  • float

  • int.

  • string

  • listBool

  • listFloat

  • listInt

  • listString

  • map

By default, XPath fields are of type string.

Examples

chevron-rightCSV with 3 columnshashtag

We have the following log:

A valid expression to parse the message is:

or

chevron-rightKey-value list with duplicated keyshashtag

We have the following log:

A valid expression to parse the message is:

And it would extract these values:

chevron-rightUnknown repetitionshashtag

A valid PCL expression could be:

Here, there are 4 fields (lastName, firstName, age and info) and 3 delimiters (, , : and the operator while).

This PCL expression can be used to extract fields from different lines of text that have the same structure. For example, given the text Doe, John 37: {"country": "UK", "occupation": "father"}, the PCL expression can be used to extract the following fields:

In another example, given the text Smith, Jane 19: {"country": "USA", "occupation": "student"}, the same PCL expression would extract:

chevron-rightString after a CSVhashtag

We have the following log:

A valid expression to parse the message is:

or

chevron-rightSyslog messagehashtag

We have the following log:

A valid PCL expression to parse this log would be

chevron-rightXML that contains several metadata valueshashtag

We have the following log:

A valid PCL expression to parse this log would be:

This would extract the following data:

chevron-rightXML that contains a listhashtag

We have the following log:

A list can be extracted with the following PCL expression:

And it would extract the following information:

chevron-rightXML that contains a list of objectshashtag

We have the following log:

A valid PCL expression to parse this log would be:

This would extract the following data:

Last updated

Was this helpful?