Modifying data format, on a per-destination basis

Michael Donnelly · March 2023

A customer asked this:

We have multiple syslog senders routing their messages to Cribl. Each of the senders has an event format that's specific to the source. And, we are routing those events to multiple destinations which also require their own format, such as CEF format for our security tool, JSON format for our logging tool.

What's the best way to handle the processing of these events, so they're correctly processed and formatted correctly for the destination?

Here is the example scenario:

We have devices sending via syslog 514 ie (f5s, Cisco ASA, Cisco routers, cisco switches, storage gear, Palos, etc.)
All the syslog messages would be different based on the vendor. Some (like the PAN) have a single device sending multiple different formats.
We're doing filtering, enrichment, reformatting on the events. We're prioritizing this effort based on volumes.
Once processed, events are sent to different destinations. Lets say we have 3: one requires JSON, one CEF, and one requires a single string of K/V pairs.

Michael Donnelly · September 2023

To avoid confusion about "source" in this reply, I'll use "Source" to mean the Cribl Source, "Sender" to refer to the devices sending, and "Dataset" to refer to a specific set of data arriving from a specific Sender.

As noted in the customer's, question, some senders will deliver multiple data sets to the same Cribl Source. For example, a single Palo Alto firewall will send PAN Traffic, PAN Threat, and PAN System data - each of which is a data set.

As a platform, Cribl Stream allows multiple methods for addressing comingled Senders arriving on the same Source, and for comingled Datasets from specific Senders.

Stream supports pre-processing pipelines attached to sources, and post-processing pipelines attached to specific destinations. Pre-processing pipelines can identify and tag data on the way in, setting the stage for filtering. Post-processing pipelines provide the ability to normalize data specific to that destination, just prior to delivery.

Data In

In the example here (data arriving via Syslog), you could use this combination approach to solve your problem:

For those syslog sources that support changing the destination port, you might consider dedicated Cribl sources for specific devices, especially those with high volume. PAN firewalls would be an excellent example.
For those senders where a Cribl pack already exists, and where the data is arriving on a dedicated port for that , you can attach the pack directly to the source as a pre-processing pipeline.
For generic syslog sources, and for those that are only able to send to port 514, consider the Cribl Syslog pre-processing pack. This pack will let you set a source, sourcetype, or other meta information on the way in so as to filter more easily when it comes to routes.

Data Processing

This handles getting the data in, but what next? You have a few options.

Option A: Configure a specific route for each combination of Dataset and destination, with different pipelines. (This is what Jon Rust was showing in his reply.)

Use this approach when you want to process the events from that Dataset differently, based on the needs of the destination.
For example, you might want to send your firewall data to your SIEM with East/West traffic sampled 10:1, and also send that same Dataset to a destination where full fidelity is required.
This gives you the most flexibility in terms of how you handle the combination of Dataset and destination, but it does require
Use multiple routes with the same filter, but different combinations of pipeline and destination. Turn off the "Final" flag on all but the last one. Create a Group on the Routes page, to put all similar routes in one place.

Option B: Configure a specific route for each Dataset, and use a combination of OutputRouter and post-processing pipelines.

In the route, specify an Output Router that delivers the events from a single pipeline to multiple destinations.
Use this approach when all data from a given Dataset needs to be processed similarly (other than destination-specific format), to avoid duplicate processing of the event.
Combine this with post-processing (see Data Out, below), to convert the format as necessary.
Since this approach is about efficiency, consider it for high-volume senders.

Data Out

Use a post-processing pipeline to selectively drop any fields that aren't needed by a given destination, and you can use the pipeline to change data to the format required by the destination.

Consider the CEF destination mentioned by the customer; use a pipeline tied to that destination. If the data isn't already in CEF format, the pipeline could use functions such as Parser and Serialize to change the data to CEF.

Steve Bridge · March 2023

In my opinion Packs are the best way. I would suggest you download the Pack for Palo Alto and study how they separate messages. Using a Pack is the most elegant way to handle this, you dont end up with so many routes under the Routing tab because most of them are in the Pack. The same method used in the PA pack can be used for anything else, even sources where there is no pack, its easy to follow the example and make your own.

brandonf · March 2023

Agree with Jon - we take a slightly different approach (more similar to a Splunk input) where we define a list of hosts in a CSV file and use it for a lookup:
__inputId.startsWith(‘syslog:in_syslog:) && C.Lookup(‘syslog-nginx_hosts.csv, ‘host).match(host)
From there you can send it to the pipeline or pack of your choice. To send duplicate feeds, copy the route and create a new one below this as Jon said and turn off the final toggle until the last route.

Jon Rust · March 2023

Yep the Routing → Data Routes page is where you want to go. You can set-up routes based on any part of individual event data.

Sample:

Michael Donnelly · September 2023