We have updated our Terms of Service, Code of Conduct, and Addendum.

How can I prevent my syslog-ng server from pinning to a single stream worker process

Options
John Pondrom
John Pondrom Posts: 16
edited September 2023 in Stream

When using syslog-ng server my syslog data is 'pinning' to a single stream worker process when sending syslog over TCP. Am sending too much data for that single WP to handle?

Tagged:

Best Answers

  • Michael Hocke
    Michael Hocke Posts: 4
    Answer ✓
    Options

    Ah, syslog over TCP. This is fun! So, the very first thing I would ask myself in this situation is: "Do I really need syslog over TCP or can I get away with UDP?" If you can, save yourself a lot of headaches and go UDP. If you just pass around logs within a datacenter or any kind of LAN setting you don't really need TCP. It doesn't gain you any advantages as long as you scale your syslog servers and load-balance correctly. Now, if you really need TCP, say you have a SaaS application that needs to send logs to your on-prem collector or you have to enable TLS), then you don't have much of a choice.

    There are a couple of things you can do to prevent pinning. First of all, you have to put your syslog servers behind a load-balancer. Be it haproxy or an Big-IP, doesn't really matter. Then you have a couple of options: you can configure your syslog clients to close and reopen the connection to the syslog server after a certain number of events. I am not sure how (if) this is done with syslog-ng but rsyslogd allows you to do that easily (see RebindInterval in omfwd; https://www.rsyslog.com/doc/v8-stable/configuration/modules/omfwd.html#rebindinterval .) If you do not have access to the client configuration, think about setting up a syslog aware proxy. It's an extremely lightweight syslog configuration running on a decent box. All it does is receive syslog messages on TCP and forward them to your haproxies. Make sure that the connection is closed and reopened either after a certain time has elapsed or a certain number of messages has been sent. You have to find the optimal values through some experimentation in your environment.

    Before you do any of this, of course, make sure that you really have to or want to do any of this. Any of the popular syslog daemons are extremely performant and don't need many resources to perform very well.

  • Michael Donnelly
    Michael Donnelly Posts: 6 mod
    Answer ✓
    Options

    The normal recommendation from Cribl would be to phase out your Syslog-NG server, and use Cribl to directly receive Syslog events from all of your senders. (You won't have TCP connection issues where Syslog-NG gets "stuck" on a single worker process, if you're not using Syslog-NG.)

    You may still have situations with high-volume TCP senders, sending over 400GB/day. As noted by Michael Hocke, you might consider switching those specific senders to use UDP rather than TCP. (UDP doesn't have "sessions" like TCP does.) Some senders (including Syslog-NG) support the distribution of the data across a pool of targets, or support multiple connections to the same target.

    For high volume senders that do not support sending to multiple destinations, the HA-Proxy approach mentioned by Balasz works. Send from high-volume TCP syslog devices to HA Proxy, and HA proxy will distribute the events across multiple Cribl workers.

    Lastly, check out Cribl's Syslog Best Practices page. You might find it useful.

Answers

  • Wayne Gillo
    Wayne Gillo Posts: 4
    Options

    You could place an haproxy in front of the syslog-ng server and point all of your log sources point to the haproxy instead of direct to syslog-ng.

    I have customers that do this:


  • Balazs Scheidler
    Options

    haproxy may not solve your problem if we are talking about a few "chatty" clients that use a single connection to send a lot of messages. I have personally seen devices that emitted 70k msg/sec on just one connection.

    As long as you only have a single source and that source insists on using a single syslog connection, you can't scale this traffic with something like HAProxy (or any other load balancer), as these all assume that you will have a lot of connections, which is not the case here.

    Using simple TCP based load balancers is not the right solution to load balance syslog.

    As it is usually difficult to change the client, you need to use something that can take on this load and send it on. My understanding is that cribl is limited to around 10k eps per connection, but YMMV.

    syslog-ng can take on a load like this and can even send out these messages over a set of output connections allowing you to reach the required scale. There have been even improvements on these use cases recently, e.g. https://axoflow.com/scale-syslog-over-udp-with-ebpf/ and the parallalize() keyword in syslog-ng 4.3 (only documented in the release notes: https://axoflow.com/axosyslog-release-4-3/ ) The first only applies to syslog/UDP, the second also applies to syslog/TCP.

    syslog-ng in general is quite performant on a per connection / per core basis, easily processing hundreds of thousands of messages per second, which it can send on to your specific target directly (e.g. splunk HEC or Elastic)

  • Michael Hocke
    Michael Hocke Posts: 4
    Answer ✓
    Options

    Ah, syslog over TCP. This is fun! So, the very first thing I would ask myself in this situation is: "Do I really need syslog over TCP or can I get away with UDP?" If you can, save yourself a lot of headaches and go UDP. If you just pass around logs within a datacenter or any kind of LAN setting you don't really need TCP. It doesn't gain you any advantages as long as you scale your syslog servers and load-balance correctly. Now, if you really need TCP, say you have a SaaS application that needs to send logs to your on-prem collector or you have to enable TLS), then you don't have much of a choice.

    There are a couple of things you can do to prevent pinning. First of all, you have to put your syslog servers behind a load-balancer. Be it haproxy or an Big-IP, doesn't really matter. Then you have a couple of options: you can configure your syslog clients to close and reopen the connection to the syslog server after a certain number of events. I am not sure how (if) this is done with syslog-ng but rsyslogd allows you to do that easily (see RebindInterval in omfwd; https://www.rsyslog.com/doc/v8-stable/configuration/modules/omfwd.html#rebindinterval .) If you do not have access to the client configuration, think about setting up a syslog aware proxy. It's an extremely lightweight syslog configuration running on a decent box. All it does is receive syslog messages on TCP and forward them to your haproxies. Make sure that the connection is closed and reopened either after a certain time has elapsed or a certain number of messages has been sent. You have to find the optimal values through some experimentation in your environment.

    Before you do any of this, of course, make sure that you really have to or want to do any of this. Any of the popular syslog daemons are extremely performant and don't need many resources to perform very well.

  • Michael Donnelly
    Michael Donnelly Posts: 6 mod
    Answer ✓
    Options

    The normal recommendation from Cribl would be to phase out your Syslog-NG server, and use Cribl to directly receive Syslog events from all of your senders. (You won't have TCP connection issues where Syslog-NG gets "stuck" on a single worker process, if you're not using Syslog-NG.)

    You may still have situations with high-volume TCP senders, sending over 400GB/day. As noted by Michael Hocke, you might consider switching those specific senders to use UDP rather than TCP. (UDP doesn't have "sessions" like TCP does.) Some senders (including Syslog-NG) support the distribution of the data across a pool of targets, or support multiple connections to the same target.

    For high volume senders that do not support sending to multiple destinations, the HA-Proxy approach mentioned by Balasz works. Send from high-volume TCP syslog devices to HA Proxy, and HA proxy will distribute the events across multiple Cribl workers.

    Lastly, check out Cribl's Syslog Best Practices page. You might find it useful.