I’ve made the switch from a single instance to a distributed setup (1 leader and 2 worker nodes). My old server, which receives logs from a variety of sources, is now acting as a worker. What happens if this worker goes down, because all of the sources are sending logs to one single worker IP? What should I do to guarantee appropriate high availability?
One of the advantages of switching to a distributed architecture is the ability to run multiple worker instances. In fact, it’s best practice to run at least 2 for just the scenario you describe above. That way you can absorb one failure without interrupting service. Add a second instance, and configure your sources to load balance data sent to Stream, or send through a load balancer.
In the past with client’s we have utilised an NGINX instance to load balance across the two worker servers, per port (if applicable). I would like to note that this can be done with any decent LB, with a VIP attached. You could also perform DNS load balancing, however this would only even the load, not provide high availability.
This also depends on your data sources. For example Splunk UF’s have the ability to automatically load balance across multiple IPs.
Hope that helps!