We have updated our Terms of Service, Code of Conduct, and Addendum.

Problem with deploy config from Leader to Workers

Options
lukasmecir
lukasmecir Posts: 5
edited March 2023 in General Discussions

Hi, I have problem with deploy config from Leader to Worker. I have distributed deployment with 2 Workers and 1 worker group (testing, free license used). I configured secure connection between Leader and Workers according to https://docs.cribl.io/stream/securing-communications 1. For the first look everything is OK - both Workers are visible on Manage Worker Nodes on Leader as alive, I can connect to Workers GUI (Remote UI Access is on), config version is OK etc.
Problem is when I want to deploy config changes to Workers. On Manage Worker Nodes I see yellow exclamation mark and neverending rotating circle in Config version column and config changes are not propagated to Workers.
In log on Leader I see this:

{"time":"2022-06-17T13:01:21.350Z","cid":"api","channel":"CriblMaster","level":"info","message":"sending config update request","group":"default","version":"8627dcf","logStreamEnv":"master","worker":"ff5ea4c4-51eb-4aa4-a650-2e6c8251bb21"}
{"time":"2022-06-17T13:01:21.363Z","cid":"api","channel":"CriblMaster","level":"warn","message":"failed config update","group":"default","version":"8627dcf","logStreamEnv":"master","worker":"ff5ea4c4-51eb-4aa4-a650-2e6c8251bb21","elapsed":13,"error":{"__criblEventType":"event","__ctrlFields":[],"__final":false,"__cloneCount":0,"type":"resp","status":500,"message":"error running handler for req=configure","error":{"message":"Received non-OK status code=403","stack":"Error: Received non-OK status code=403\n    at ClientRequest.<anonymous> (/srv/app/int/secmon/cribl/bin/cribl.js:14:11928157)\n    at Object.onceWrapper (events.js:520:26)\n    at ClientRequest.emit (events.js:400:28)\n    at ClientRequest.emit (domain.js:475:12)\n    at HTTPParser.parserOnIncomingClient (_http_client.js:647:27)\n    at HTTPParser.parserOnHeadersComplete (_http_common.js:127:17)\n    at Socket.socketOnData (_http_client.js:515:22)\n    at Socket.emit (events.js:400:28)\n    at Socket.emit (domain.js:475:12)\n    at Socket.Readable.read (internal/streams/readable.js:504:10)"},"req":"configure","reqId":8,"rpc":false,"__raw":"{\"type\":\"resp\",\"status\":500,\"message\":\"error running handler for req=configure\",\"error\":{\"message\":\"Received non-OK status code=403\",\"stack\":\"Error: Received non-OK status code=403\\n    at ClientRequest.<anonymous> (/srv/app/int/secmon/cribl/bin/cribl.js:14:11928157)\\n    at Object.onceWrapper (events.js:520:26)\\n    at ClientRequest.emit (events.js:400:28)\\n    at ClientRequest.emit (domain.js:475:12)\\n    at HTTPParser.parserOnIncomingClient (_http_client.js:647:27)\\n    at HTTPParser.parserOnHeadersComplete (_http_common.js:127:17)\\n    at Socket.socketOnData (_http_client.js:515:22)\\n    at Socket.emit (events.js:400:28)\\n    at Socket.emit (domain.js:475:12)\\n    at Socket.Readable.read (internal/streams/readable.js:504:10)\"},\"req\":\"configure\",\"reqId\":8,\"rpc\":false}","__socketAddr":"10.88.29.42:16337","__srcIpPort":"10.88.29.42:16337"}}
{"time":"2022-06-17T13:01:21.363Z","cid":"api","channel":"CriblMaster","level":"warn","message":"failed to get worker configs up-to-date","group":"default","version":"8627dcf","guid":"ff5ea4c4-51eb-4aa4-a650-2e6c8251bb21","error":{"__criblEventType":"event","__ctrlFields":[],"__final":false,"__cloneCount":0,"type":"resp","status":500,"message":"error running handler for req=configure","error":{"message":"Received non-OK status code=403","stack":"Error: Received non-OK status code=403\n    at ClientRequest.<anonymous> (/srv/app/int/secmon/cribl/bin/cribl.js:14:11928157)\n    at Object.onceWrapper (events.js:520:26)\n    at ClientRequest.emit (events.js:400:28)\n    at ClientRequest.emit (domain.js:475:12)\n    at HTTPParser.parserOnIncomingClient (_http_client.js:647:27)\n    at HTTPParser.parserOnHeadersComplete (_http_common.js:127:17)\n    at Socket.socketOnData (_http_client.js:515:22)\n    at Socket.emit (events.js:400:28)\n    at Socket.emit (domain.js:475:12)\n    at Socket.Readable.read (internal/streams/readable.js:504:10)"},"req":"configure","reqId":8,"rpc":false,"__raw":"{\"type\":\"resp\",\"status\":500,\"message\":\"error running handler for req=configure\",\"error\":{\"message\":\"Received non-OK status code=403\",\"stack\":\"Error: Received non-OK status code=403\\n    at ClientRequest.<anonymous> (/srv/app/int/secmon/cribl/bin/cribl.js:14:11928157)\\n    at Object.onceWrapper (events.js:520:26)\\n    at ClientRequest.emit (events.js:400:28)\\n    at ClientRequest.emit (domain.js:475:12)\\n    at HTTPParser.parserOnIncomingClient (_http_client.js:647:27)\\n    at HTTPParser.parserOnHeadersComplete (_http_common.js:127:17)\\n    at Socket.socketOnData (_http_client.js:515:22)\\n    at Socket.emit (events.js:400:28)\\n    at Socket.emit (domain.js:475:12)\\n    at Socket.Readable.read (internal/streams/readable.js:504:10)\"},\"req\":\"configure\",\"reqId\":8,\"rpc\":false}","__socketAddr":"10.88.29.42:16337","__srcIpPort":"10.88.29.42:16337"}}

and on Worker

{
    "time": "2022-06-17T13:01:21.351Z",
    "cid": "api",
    "channel": "CriblWorker",
    "level": "info",
    "message": "leader triggered configure",
    "version": "8627dcf",
    "group": "default",
    "checksum": "sha1:27d31c969b154e7addc0af8afd8133ae4510b6c8",
    "url": "https://10.88.29.12:4200/api/v1/master/bundles/default/8627dcf?guid=ff5ea4c4-51eb-4aa4-a650-2e6c8251bb21",
    "source": "cribl.log"
}

I tried with different kinds of certificates (self-signed certificates generated on cribl servers, certificates made by CA which I configured on Leader node, certificates from external 3rd party CA - still the same result.
Without TLS everything works well.
Could you please help me point out what is wrong?
Many thanks for help in advance.
Best regards
Lukas Mecir

Tagged:

Best Answer

  • Brandon McCombs
    Brandon McCombs Posts: 150 mod
    Answer ✓
    Options

    It seems like Ive seen this before but cant quite remember what the solution was. One question I have though is whether your workers have a proxy env variable set for getting to the Internet?

    It seems the notification from the leader to the worker is getting through since the worker logs the event indicating it was told by leader that a new config is available despite the logs on the leader showing both a 500 and 403 errors. I think when we saw this in the past it was because the workers were trying to connect to the leader via proxy and never getting through because the proxy wasnt directing them to the leader but to the Internet.

    The reason the workers would be directed away from the leader by the proxy is because a separate connection from workers to leader occurs when they download their config deployments. They use the same port (4200/tcp) as regular cluster communications but because the request is HTTP-based it can be hijacked by OS proxy env variables. Regular cluster comms arent HTTP-based.

    And maybe the reason why it works without TLS but fails with TLS is because you only have a proxy env var set for HTTPS but not HTTPS. Just a guess.

    hope that helps.

Answers

  • Brandon McCombs
    Brandon McCombs Posts: 150 mod
    Answer ✓
    Options

    It seems like Ive seen this before but cant quite remember what the solution was. One question I have though is whether your workers have a proxy env variable set for getting to the Internet?

    It seems the notification from the leader to the worker is getting through since the worker logs the event indicating it was told by leader that a new config is available despite the logs on the leader showing both a 500 and 403 errors. I think when we saw this in the past it was because the workers were trying to connect to the leader via proxy and never getting through because the proxy wasnt directing them to the leader but to the Internet.

    The reason the workers would be directed away from the leader by the proxy is because a separate connection from workers to leader occurs when they download their config deployments. They use the same port (4200/tcp) as regular cluster communications but because the request is HTTP-based it can be hijacked by OS proxy env variables. Regular cluster comms arent HTTP-based.

    And maybe the reason why it works without TLS but fails with TLS is because you only have a proxy env var set for HTTPS but not HTTPS. Just a guess.

    hope that helps.

  • lukasmecir
    lukasmecir Posts: 5
    Options

    Hi brandon, sorry for late aswer, but you were completely right with proxy. There was proxy directed all traffic to internet, so WN was not able reach to LN. I added line no_proxy=<LN address> to cribl.service file on both WN (Cribl is running under systemd) and problem solved.
    Thank you very much for your help.