Processors

Parsing and normalizing data

Operational Insight makes use of Logstash framework in its data pipeline for log event processing. Logstash supports a pluggable pipeline architecture. It accepts inputs from a variety of sources, parses and transforms the data using user defined rules, and writes out the parsed data to an Elasticsearch cluster. Logstash provides rich capabilities for processing and transforming logs as well as other forms of data. It supports a large and extensible array of input, filter, and output plugins and codecs, allowing any type of event to be enriched and transformed as part of the ingestion process.

Log Event Processing Pipeline

The event processing pipeline has three stages: inputs → filters → outputs. Inputs generate events, filters modify them, and outputs ship them elsewhere. Inputs and outputs support codecs that can be used to encode or decode the data as it enters or exits the pipeline without having to use a separate filter.

Inputs

Inputs are used to get data into the pipeline. Some commonly-used inputs are:

  • file: read from a file on the filesystem, much like the UNIX command “tail -0F”.

  • syslog: listens on the well-known port 514 for syslog messages in the RFC3164 format.

  • beats: processes events sent by filerelay and other metric beats.

Filters

Filters are intermediary processing stage in the pipeline. They can be combined with conditional logic to perform an action on an event if it meets certain criteria. Some examples include:

  • grok: parse and structure arbitrary text. Use to parse unstructured log data into something structured which can be queried

  • mutate: perform general transformations on event fields. Rename, remove, replace, and modify fields in the events.

  • drop: drop an event completely

  • clone: make a copy of an event, possibly adding or removing fields.

Outputs

Outputs form the final phase of the pipeline. An event can pass through multiple outputs, but once all output processing is complete, the event has finished its execution. Some common outputs are:

  • Elasticsearch: send event data to Elasticsearch.

  • File: write event data to a file on disk.

Codecs

Codecs are stream filters that can operate as part of an input or output. Codecs enable easy separation of the transport of messages from the serialization process. Popular codecs include json, multiline, and plain (text).

  • json: encode or decode data in the JSON format.

  • multiline: merge multiple-line text events such as java exception and stack-trace messages into a single event.

For more details, please refer to Logstash documentation

Last updated