Processing Syslog messages and sending to Splunk

Steve Bridge · March 2023

Hi all,
I want to do a drop-in replacement for an existing syslog-ng + Splunk HF system. Currently, it receives a variety of logs via syslog, syslog-ng writes them to disk (using different paths per sourcetype), and then the Splunk HF reads them, applies index-time transforms based on sourcetype, and forwards to the indexing tier. I can see a variety of ways to approach this, so I’m looking for a best-practice recommendation. How would you set this up so that some event sources can be processed with Cribl, but others can be left to the Splunk HF/Indexer? In other words, I’d like to make use of things like the PAN pack, but also be able to have it just work the same as before with some outlier sources, which means passing to the HF to process props/transforms on the data.

Jon Rust · March 2023

Put Stream in charge of all syslog data, and use Route filters to send the outlier data to a syslog destination (syslog-ng → HF). You could minimally filter the data, or simply passthru. The Route filters could react to raw content, host IPs, host names, lookup tables, or any combination.

Michael Donnelly · September 2023

Cribl does recommend dedicated ports for those Syslog senders that support it, with one Cribl source listening for each type of Syslog sender. Chances are that your current syslog server was already set up this way, and Cribl can do the same. This allows you to assign meta information directly to each source, including source, sourcetype, and index. It also allows you to attach a pack directly to that source for processing the event. (For example, a dedicated Syslog port for PAN, with the PAN pack and meta information set at the time of ingest.)

Since many Syslog senders cannot change their target port, you will end up listening on port 514 for data from multiple senders. To support the assignment of source, sourcetype, and Index values comingled types of senders, use the Cribl Syslog Pre-processing pack. It includes lookups to map sender information to the meta information.

In either of these cases, you avoid the step of writing data to disk using one product, just to pick it back up using another product.

Now about those existing props and transforms that you mentioned… In Jon Rust's post of March 27, he shows Cribl sending to a Splunk HF for those extractions. There's a catch: data sent from Cribl to Splunk using the Splunk destination will arrive 'cooked' into Splunk, and any ingest-time transforms will be skipped.

For most, the simplest approach is to focus on the small number of ingest time transforms from your Splunk TAs, and ensure that the same results are happening within Cribl Stream before sending that data onwards to Splunk. In many cases, you'll find that Cribl packs already handle that dataset and do those extractions for you. If the extractions are happening in Cribl, you may as well deliver from Stream to your Splunk Indexers directly, bypassing the HF.

If you really need to have Splunk perform ingest-time extractions on events received from Cribl, you will need to use the Splunk HEC destination to do this. You'll also need to go to that destination's Advanced Settings, and change the "Next Processing Queue" setting so the HF will parse that event using your props and transforms.

Jon Rust · March 2023

My take would be to leave syslog-ng in place, listening on a specific port (or ports) for the outlier scenarios. The syslogng-HF path would remain for those. Install Stream along side to process the other sources. With unique ports for Stream and syslogng, there shouldnt be any conflicts. Only question then is scale. How much data you talkin?

Steve Bridge · March 2023

Ah, therein lies the rub. Many of these devices send only on 514, and I cant easily change it. So I need some way other than the port to separate the data, otherwise Ill have to just keep the HF in place. Volume is light at the moment because the firewall logs are not cut over yet, but I anticipate less than 500 MB/day.

Jon Rust · March 2023

Put Stream in charge of all syslog data, and use Route filters to send the outlier data to a syslog destination (syslog-ng → HF). You could minimally filter the data, or simply passthru. The Route filters could react to raw content, host IPs, host names, lookup tables, or any combination.

Balazs Scheidler · August 2023

syslog-ng can also feed Cribl either using its Elastic or HEC destinations. So if you do know what kind of traffic you want to send towards cribl, you can already do this in your syslog tier, by using syslog-ng filters and routing logic.

Michael Donnelly · September 2023

Cribl does recommend dedicated ports for those Syslog senders that support it, with one Cribl source listening for each type of Syslog sender. Chances are that your current syslog server was already set up this way, and Cribl can do the same. This allows you to assign meta information directly to each source, including source, sourcetype, and index. It also allows you to attach a pack directly to that source for processing the event. (For example, a dedicated Syslog port for PAN, with the PAN pack and meta information set at the time of ingest.)

Since many Syslog senders cannot change their target port, you will end up listening on port 514 for data from multiple senders. To support the assignment of source, sourcetype, and Index values comingled types of senders, use the Cribl Syslog Pre-processing pack. It includes lookups to map sender information to the meta information.

In either of these cases, you avoid the step of writing data to disk using one product, just to pick it back up using another product.

Now about those existing props and transforms that you mentioned… In Jon Rust's post of March 27, he shows Cribl sending to a Splunk HF for those extractions. There's a catch: data sent from Cribl to Splunk using the Splunk destination will arrive 'cooked' into Splunk, and any ingest-time transforms will be skipped.

For most, the simplest approach is to focus on the small number of ingest time transforms from your Splunk TAs, and ensure that the same results are happening within Cribl Stream before sending that data onwards to Splunk. In many cases, you'll find that Cribl packs already handle that dataset and do those extractions for you. If the extractions are happening in Cribl, you may as well deliver from Stream to your Splunk Indexers directly, bypassing the HF.

If you really need to have Splunk perform ingest-time extractions on events received from Cribl, you will need to use the Splunk HEC destination to do this. You'll also need to go to that destination's Advanced Settings, and change the "Next Processing Queue" setting so the HF will parse that event using your props and transforms.

Processing Syslog messages and sending to Splunk

Best Answers

Answers

Categories