Events in flight (and memory queues/buffers)


An interesting topic came up yesterday which I didn’t have a good answer too, around how many events are in flight in Stream at any point in time.

It was a discussion around potential for data loss, and if something adverse were to happen the worker, what the impact would be. I imagine there is up to one event being processed per worker thread at any point in time, but is there a buffer of events before the worker thread picks it up? If so, how much, and if not, does the thread just block the network socket until it’s done processing?

Supplementary to that - what happens when a destination goes down? Persistent queues aside, how much will Cribl buffer before applying back-pressure to the pipeline?

And probably scenarios I haven’t considered. How much data is it theoretically possible to have in flight at any point in time, and thus risk of loss in case of catastrophe?

2 UpGoats

There can be thousands per worker process in-flight at any time. There is the socket receive buffer at the OS level along with in-app buffers. A process doesn’t block a socket if it’s too busy, it simply won’t read from the buffer fast enough so the buffer is overrun events are deemed as dropped by the OS (if data continues flowing in at the same or faster rate than the rate at which Stream can process it).

An actual block signal only occurs back to a source if triggered by an output.

There are application buffers to store data in before writing it to a socket to be sent to a destination and the socket itself has a buffer (send and receive buffers are separate) provided by the OS.

The in-app buffers are a couple MB in size and each worker process has its own and per destination…how many events that translates to depends on the event size. Not sure offhand if the buffer size differs per destination type. Backpressure would start once a given buffer is full.

The more worker processes that exist the more data can be in-flight. The more connections that exist the more data that can be in-flight, assuming they are active connections. The ingress byte/sec rate also is a factor as well as what’s being done with the data (is it being dropped? cloned? etc). There is no specific answer to the question of how much is at risk.

2 UpGoats