PCL (Parser Configuration Language)

Introduction

PCL is a language designed to extract data from a line of text by describing its structure. The language aims to use a concise and intuitive syntax to help visualize the structure of a line of text.

PCL expressions are used to configure the Parser Action.

Syntax basics

A valid PCL expression must be composed of one or more fields, as long as there are separators between them. That is, it must follow this rule:

delimiter? fixedLength* field(delimiter fixedLength* field)* delimiter?

Where a delimiter could be a literal or an operator. This last one could optionally have surrounding literals.

When using groups, the PCL behaviour can change, as groups are a special type of field that can be written next to other fields without a delimiter. Check the Group section below to learn more about this.

At the moment, the only possible fixed-length field is a string.

Valid example

{myFieldOne:string} {myFieldTwo:int}<while(value=" ")>{myCsv:csv(fields=[0,2],separator=",")}

Invalid example (no delimiters)

{myFieldOne:string}{myFieldTwo:int}

The grammar supports any kind of name that is written with the set of characters A-Z, a-z, 0-9 and the symbol underscore (_). It supports field aliases with any name written with the set of characters A-Z, a-z, 0-9, _, -, # and . (given that the first character is not _).

Syntax fields

In PCL, we can write any sequence of fields. The type of fields can be the following:

Learn more about each field option in the Field options section below.

CSV

CSV is a configurable field. The available parameters are:

Parameter

Description

alias

Rename the field name.

indices

Select which columns you want to extract from the CSV.

separator

Define the separator of the columns.

totalColumns

Indicate the number of columns of your CSV.

Note that the totalColumns parameter is mandatory when there is a delimiter after the field that is equal to the CSV separator. For example, a CSV with 3 columns and a JSON separated by a comma:

1,2,3,{"hello":"world"}

Examples

{myFieldName:csv(indices=[0,1,3],totalColumns=4,separator=",")}
{myFieldName:csv(indices=[0,1,3],totalColumns=4,separator=",", alias="newCsvName")}
{myFieldName:csv(indices=[0:string(alias="csvFieldName1"), 1:string(alias="csvFieldName2")], alias="newCsvNames")

Float

Float is a configurable field. The available parameters are:

Parameter

Description

alias

Rename the field name.

decimalSeparator

Define the separator to the decimal.

thousandSeparator

Define the separator to the thousands.

Note that the decimalSeparator and thousandSeparator parameters cannot contain the same value.

Example

{myField:float(decimalSeparator=".")}

Group

A Group is a special type that might contain two or more of the following simple types:

Float
Integer
Separator
String

Groups cannot be used inside other groups. The available parameters are:

Parameter

Description

optional

Everything inside the group marked with this option could or not be in the log to parse.

Optionally, groups can have their type defined.

Examples

{myGroupName:{{myFieldOne:string} {myFieldTwo:int} {myFieldThree:int}}}
{myGroupName:{_{myFieldOne:string} {myFieldTwo:int} {myFieldThree:int}_}}
{myGroupName:group{_{myFieldOne:string} {myFieldTwo:int} {myFieldThree:int}_}}
{myGroupName:group(optional=true){_{myFieldOne:string} {myFieldTwo:int} {myFieldThree:int}_}}

The concatenation of simple types inside a group behaves in the same way as a normal PCL, always taking into account the restrictions of which types can be used.

For a PCL containing a group field to be considered valid, the concatenation of fields/delimiters surrounding the group and the inner content of the group must form a valid PCL. Therefore, when unwrapping the inner PCL and joining it with the outside PCL, it must be valid. This means groups don't need to be separated from other fields by delimiters, as long as the resulting PCL is valid. This is because delimiters can be found at the start or end of a group.

For example, given the following valid PCL:

{stringField:string}{myGroupName:group{_{myFieldOne:string} {myFieldTwo:int} {myFieldThree:int}_}}

It could be a valid PCL as the concatenation will result in the following:

{stringField:string}_{myFieldOne:string} {myFieldTwo:int} {myFieldThree:int}_

However, the following PCL would be considered invalid:

{stringField:string}{myGroupName:group{{myFieldTwo:int} {myFieldThree:int}_}}

As the result of the concatenation will result in two fields being together:

{stringField:string}{myFieldOne:string} {myFieldTwo:int} {myFieldThree:int}_

This also applies when using the optional operator. The different PCLs generated containing or not the optional group must be valid. A valid example would be:

{stringField:string}{myGroupName:group(optional=true){_{myFieldOne:string}}}{myGroupName:group(optional=true){_{myFieldOne:string}}}

All the possible PCLs have their fields correctly separated by delimiters. On the other hand, if we had a PCL like the following, it would be considered invalid:

{stringField:string}{myGroupName:group(optional=true){_{myFieldOne:string}}}{myGroupName:group(optional=true){{myFieldOne:string}}}

As one of the possible PCLs would be:

{stringField:string}_{myFieldOne:string}{myFieldOne:string}

There are two fields together, which is not considered valid.

Integer

Integer is a configurable field. The available parameters are:

Parameter

Description

alias

Rename the field name.

thousandSeparator

Define the separator to the thousands.

Example

{myFieldName:int(thousandSeparator=",")}

JSON

JSON is a configurable field. The available parameters are:

Parameter

Description

alias

Rename the field name.

thousandSeparator

Select which items you want to extract from the JSON.

Examples

{myFieldName:json(fields=["itemOne","itemTwo"])}
{myfield:json(fields=["hello ":string(alias="hello_"), "bye ":string(alias="bye_")])}

Key-value list

Key-value list is a configurable field. The available parameters are:

Parameter

Description

alias

Rename the field name.

kvSeparator

Define the separator between keys and values.

listSeparator

Define the separator between each key-value item.

indices

Select which columns you want to extract from the list by their position in the list.

fields

Select which items you want to extract from the list by their key names.

The indices and fields operators cannot be used simultaneously.

Examples

{myFieldName:keyValueList(kvSeparator=":",listSeparator=",")}
{myFieldName:keyValueList(fields=["hello ":string(alias="hello_")

String

String is a configurable field. The available parameters are:

Parameter

Description

alias

Rename the field name.

length

Define the length of the string.

escapableChar

Escape delimiter characters in the string.

Examples

{myField:string(length=2)}
{myField:string(length=2, alias="newFieldName")}

There is a special case with String fields. If the field is using the length parameter, we may add another field next to it without any separator:

{oneField:string(length=2)}{anotherField:string(length=3)}{lastField:string}

XML

XML is a configurable field. The available parameters are:

Parameter

Description

alias

Rename the field name.

xpaths

Select which items you want to extract from the XML using a subset of the XPath query language.

Examples

{myFieldName:xml(xpaths=["/data/event","/data/event@id"])}
{myfield:xml(xpaths=["/data/name":string(alias="name"), "/data/event":listString])}

Syntax literals

A literal is a special type of element. This is a string that must exist between two fields. Unlike other fields, the literals are just a string that may contain one or more characters except <, >, {, or }, unless they are escaped with \

This is an example of a literal (whitespace ):

 {myFieldOne:string} {myFieldTwo:string}

Syntax operators

There are two types of operators:

Skip

The Skip operator acts like a dynamic separator. It can be used when we want to skip any content until we find a coincidence.

This is a configurable operator that is equivalent to the regular expression (?:from)*(?=to) where from and to are the strings to match.

The available parameters are:

Parameter

Description

from*

Define the string to find one or more times.

to*

Define the string to insert one or more times.

Example

<skip(from=" ",to="-")>

A use case could be to skip all characters until a JSON is found. For example, for this log:

hello thisisrubbish{"my": "json"}

We could use this PCL:

{f1:string}<skip(from=" ", to="{")> {f2:json}

While

The While operator acts like a dynamic separator. It is useful if a separator has an unknown number of repetitions on each log.

This is a configurable operator that is equivalent to the regular expression (?:value)* where value is the string to match. However, if the options min and/or max are defined, then the equivalent regular expression is (?:value)*{- min,max}

The available parameters are:

Parameter

Description

value*

Define the string to find one or more times.

max

Set the maximum number of repetitions of value (must be greater than 0).

min

Set the minimum number of repetitions of value (must be greater than 0).

It is not necessary to define both max and min. However, if both are defined, then it must assert that min is strictly lower than max, that is, min < max.

<while(value=" ",max=2)>

In another example, let - as a separator that appears at least 3 times in all logs:

hello - - -world
goodbye - - - - -world
hello - - - -moon

Then, the PCL could be {f1:string}<while(value=" -", min=3)>{f2:string}

Field options

alias

The value must follow the naming requirements:

The allowed set of characters is: A-Z, a-z, 0-9, ., -, _ or #.
An alias cannot start with _.

These are valid examples:

alias="myNewName"
alias="my-new-name"

These are invalid examples:

alias="_myNewName"
alias="my new name"

default

The value must be of the same type as the parent. For example, if it is the default value of an integer, then the default value must be an integer too.

These are valid examples:

{myfield:json(fields=["hello":string(default="{}")])}
{myfield:csv(fields=["world":int(default=-1)])}

These are invalid examples:

{myfield:json(fields=["hello":string(default=-1)])}
{myfield:float(default=1.5)}

decimalSeparator

The value could be:

,
. (default value)

These are valid examples:

decimalSeparator=","
decimalSeparator="."

These are invalid examples:

decimalSeparator=""
decimalSeparator="-"
decimalSeparator="_"

fields

The value must be a list of strings. Note that the list cannot contain other values (e.g. numbers).

Additionally, we may specify the type of each field by writing a colon (:) followed by the type: bool, float, int or string. For example: fields=["oneField":bool, "middleField", "anotherField":int]. If the type is omitted, it should be assumed that the type is string. In the previous example, it assumes that middleField is a string.

Each sub-type may have these options:

alias (optional) to rename the field name.
default (optional) to set a fixed value if the field does not exist in the log.

These are valid examples:

fields=["oneField","anotherField.with.subField"]
fields=["oneField":string(alias="anotherName")]
fields=[]

These are invalid examples:

fields=[oneField,anotherField]
fields=["oneField,anotherField"]
fields=[0,1]

indices

The value must be a list with numbers. Note that the list cannot contain other values apart from positive integers (including zero).

Additionally, we may specify the type of each index by writting a colon (:) followed by the type: bool, float, int or string. For example: indices=[0:bool, 1, 3:int]. If the type is omitted, it assumes that the type is string. In the previous example, it assumes that 1 is a string.

Each sub-type may have these options:

alias (optional) to rename the field name.
default (optional) to set a fixed value if the field does not exist in the log.

These are valid examples:

indices=[0,1,3]
indices=[1:string(default="not exists")]
indices=[]

These are invalid examples:

indices=["0","1"]
indices=[-3,1]

kvSeparator

The value can be as long as needed, there is no character limit. By default, it is =.

These are valid examples:

kvSeparator=":"
kvSeparator="\t"
kvSeparator="hello"

These are invalid examples:

kvSeparator=""
kvSeparator=:

Note that " must be escaped. For example: kvSeparator="\""

length

The value must be a strictly positive integer.

These are valid examples:

length=1
length=25

These are invalid examples:

length="1"
length=0
length=-3

listSeparator

The value must be a non-empty text. By default, it is ,

These are valid examples:

listSeparator=";"
listSeparator="|"
listSeparator="hello"

These are invalid examples:

listSeparator=""
listSeparator=;

Note that " must be escaped. For example: listSeparator="\"".

separator

The value must be a character from the set: |, ;, ,, \t. By default, it is ,.

These are valid examples:

separator=";"
separator="\t"

These are invalid examples:

separator="-"
separator=;

totalColumns

The value must be a strictly positive integer.

These are valid examples:

totalColumns=1
totalColumns=5

A valid value for this option must equal the number of columns in the CSV.

These are invalid examples:

totalColumns="1"
totalColumns=0
totalColumns=-3

thousandSeparator

The value could be:

empty string (default value).
,
.

These are valid examples:

thousandSeparator=""
thousandSeparator="."

These are invalid examples:

thousandSeparator="-"
thousandSeparator="_"

Use case

message: "foo|bar|"foo|bar"|another field after the CSV"

A valid expression to parse the message is:

{fieldName1:csv(separator="|",totalColumns=3)}|{fieldName2:json()}

{csvField:csv(separator="|",indices=[0,1,2],totalColumns=3)}|{stringField:string}

Examples

CSV with 3 columns

We have the following log:

foo|bar|\"foo|bar\"|{"hello": "world"}

A valid expression to parse the message is:

{fieldName:csv(separator="|",totalColumns=3)}|{fieldName2:json()}

{csvField:csv(separator="|",indices=[0,1,2],totalColumns=3)}|{stringField:string}

Key-value list with duplicated keys

We have the following log:

key1=value1 key2=value2 key3=3 key1=anotherValue1

A valid expression to parse the message is:

{field:keyValueList(kvSeparator="=", listSeparator=" ", fields=["key1":listString(alias="key1AsList"), "key2":string(), "key3":int()])}

And it would extract these values:

field: "key1=value1 key2=value2 key3=3 key1=anotherValue1"
key1AsList:
- value1
- anotherValue1
field.key2: "value2"
key3: 3

Unknown repetitions

A valid PCL expression could be:

{lastName:string}, {firstName:string}<while(value=" ")>{age:int}: {info:json(fields=["country","occupation"])}

Here, there are 4 fields (lastName, firstName, age and info) and 3 delimiters (, , : and the operator while).

This PCL expression can be used to extract fields from different lines of text that have the same structure. For example, given the text Doe, John 37: {"country": "UK", "occupation": "father"}, the PCL expression can be used to extract the following fields:

lastName: "Doe"
firstName: "John"
age: 37
info: "{\"country\": \"UK\", \"occupation\": \"father\"}"
info.country: "UK"
info.occupation: "father"

In another example, given the text Smith, Jane 19: {"country": "USA", "occupation": "student"}, the same PCL expression would extract:

lastName: "Smith"
firstName: "Jane"
age: 19
info: "{\"country\": \"USA\", \"occupation\": \"student\"}"
info.country: "USA"
info.occupation: "student"

String after a CSV

We have the following log:

foo|bar|"foo|bar"|another field after the CSV

A valid expression to parse the message is:

{fieldName1:csv(separator="|",totalColumns=3)}|{fieldName2:string}

{csvField:csv(separator="|",indices=[0,1,2],totalColumns=3)}|{stringField:string}

Syslog message

We have the following log:

<165>1 2003-10-11T22:14:15.003Z myhostname myapp 1234 ID47 - An application event log entry...

A valid PCL expression to parse this log would be

<while(value=" ",max=2)>\<{priority:string}\>{version:int} {eventtimestamp:string} {hostname:string} {appName:string} {procId:string} {msgId:string} - {msg:string}

XML that contains several metadata values

We have the following log:

<event>
  <date timezone="UTC">2025-02-27</date>
  <metadata name="ProcId">1</metadata>
  <metadata name="UserId">123</metadata>
  <log>user logged in</log>
  <log>user update registry</log>
  <log>user logged out</log>
</event>

A valid PCL expression to parse this log would be:

{field:xml(xpaths=["/event/date","/event/date@timezone","/event/metadata":map(key="/@name",outputFormat="json"),"/event/log":listString])}

This would extract the following data:

field: "..." # Full XML here
field.event.date: "2025-02-27"
field.event.date#timezone: "UTC"
field.event.metadata: "{\"ProcId\":\"1\",\"UserId\":\"123\"}"
field.event.log:
  - "user logged in"
  - "user updated a registry"
  - "user logged out"

XML that contains a list

We have the following log:

<data>
  <log>user logged in</log>
  <log>user updated a registry</log>
  <log>user logged out</log>
</data>

A list can be extracted with the following PCL expression:

{field:xml(xpaths=["/data/log":listString)])}

And it would extract the following information:

field: "..." # Full XML here
field.data.log:
  - "user logged in"
  - "user updated a registry"
  - "user logged out"

XML that contains a list of objects

We have the following log:

<event>
  <metadata>
    <field>procId</field>
    <value>1234</value>
  </metadata>
  <metadata>
    <field>userId</field>
    <value>4321</value>
  </metadata>
  <metadata>
    <field>message</field>
    <value>hello!</value>
  </metadata>
</event>

A valid PCL expression to parse this log would be:

{field:xml(xpaths=["/event/metadata":map(key="/field",value="/value",outputFormat="json")])}

This would extract the following data:

field: "..." # Full XML here
field.event.metadata: "{\"procId\":\"1234\",\"userId\":\"4321\",\"message\":\"hello!\"}"

PreviousParser NextUtils

Last updated 15 days ago

Was this helpful?