Deploy of new configurations are taking much longer than before

I’ve noticed that lately, when deploying new configurations, they are taking longer than they’ve previously taken.

What can I look at, in the logs or system, to determine why?

1 UpGoat

The most common root causes for this are:

  • Core dump files in your cribl_home directory. Delete any core.* files that you find in cribl_home.

  • A massive ammount of staging directories created by your s3 outputs. Please navigate to the advanced settings of that output and ensure that staging directory cleanup is enabled.

The reason the above two are the major culprits is that when a worker node receives a new configuration bundle, it performs a backup of all of cribl_home before deploying the new configuration. The new configs will not be applied until this backup completes.

You can search for channel == CriblWorker to and look for configuration related messages. If the last config related message you see is “creating conf backup” then the worker node is most likely still performing a backup. Worker nodes running on slower underlying disk/io will also take longer to deploy changes.

P.S. Future versions of cribl stream will exclude core files from being backed up before configuration changes.

4 UpGoats