Question on pipeline timeout during testing vs real time running

b1scu1t · March 2023

I wrote a pretty elaborate pipeline that extracts a bunch of metrics from a pretty hideously ugly event and further creates a multi metric that is further shipped to a metric index in splunk thanks to @david & @bdalpe.

I noticed today when I was making some minor edits to this pipeline it would time out with the default 10 second, but when I made it 60 seconds, it ran to success. When I promote this pipeline into production, how can I make sure none of my pipelines timed out and dropped the data? When I query Splunk I do see the metrics, but I just want to make sure I am not losing stuff.

Thanks!

Jon Rust · March 2023

How big is the sample file?

b1scu1t · March 2023

The sample file is small, just about 800kb.

Jon Rust · March 2023

The timeout issue in the GUI may be due to the size of the file. The backend NodeJS engine does the processing, but your browsers JS engine must instantiate the entire file in memory for the preview window to operate. Browsers arent great at doing this.

To answer your larger question about performance, you can add an eval function at the top of the pipeline and set __startTime to +Date.now(). Add another at the end of the pipeline, setting elapsed to +Date.now() - __startTime. This will give the the milliseconds your pipeline took to run.

You can either add an Aggregation function (in passthrough mode) to get avg/min/max of elapsed, then drop the field after. Or you can send all events through with this new field attached and figure the metrics you want in your log analysis toolset.

Eval, start timer
Do Work™
Eval, stop timer
optional: Aggregations, remove elapsed field

This should give you a pretty good picture of processing time.

Question on pipeline timeout during testing vs real time running

Answers

Categories