How can I aggregate events across multiple collector tasks from a single collection job into a single event?

GOATS,

I run multiple REST collectors where the APIs I am collecting from have hard limits on the number of events returned which requires me to paginate.

Doing this creates multiple collection tasks. Part of my main pipeline runs a custom REST function to create an object in the API based on a field in the event and returns data to enrich my event. If the values exists in the API I get an “this already exists” error. I normally just aggregate this field, make my API call, then join and unroll the events on the enrichments.

This breaks when the REST collector creates multiple collection tasks as it seems that I cannot aggregate all of my events into one single aggregate event even when all the events are within my time window.

How can I aggregate my events from a single job that creates multiple collection tasks into a single event?

Thank you!

2 UpGoats

Cribl is built on a share-nothing architecture, which works great for spreading the load across multiple CPUs. This is exactly what happens with multiple collection tasks. To do cross-worker process aggregation, you will need Redis. You can install your own instance on another server or leverage Elasticache from AWS or similar from other Cloud services. A Redis Knowledge Pack is currently in the works. For now, check out the Crowdstrike pack’s Network pipeline. There is a group within there for aggregations using Redis. You can use that as a template to create your own aggregation pipeline. There is a similar aggregation group within the VPC Flow Pack.

4 UpGoats