Problem with Worker connecting to Leader node

mikeylee · March 2023

Hi Guys,

I have setup a test environment on my laptop using VBOX and with a Centos7 edge leader node and worker using ports 9000(ui) and 4200 for internal comms.

I‚Äôve installed the worker node using curl. Seems there is an issue with the worker node connecting to the leader as shown below. From the worker node, telnet to the leader nodes port 9000 and 4200 connect fine. But not sure what additional stuff to check based on the below messages:

{"time":"2022-05-23T00:50:14.256Z","cid":"api","channel":"output:DistWorker","level":"info","message":"attempting to connect","host":"10.0.2.4","port":4200,"tls":false}
{"time":"2022-05-23T00:50:14.256Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"will retry to connect","nextConnectTime":1653267018346}
{"time":"2022-05-23T00:50:14.256Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"connecting","host":"10.0.2.4","port":4200,"tls":false}
{"time":"2022-05-23T00:50:14.264Z","cid":"api","channel":"input:DistMaster","level":"debug","message":"opened connection","src":"10.0.2.4:4200"}
{"time":"2022-05-23T00:50:14.264Z","cid":"api","channel":"output:DistWorker","level":"info","message":"connected","host":"10.0.2.4","port":4200,"tls":false}
{"time":"2022-05-23T00:50:14.264Z","cid":"api","channel":"output:DistWorker","level":"info","message":"flushing buffer backlog","count":1,"totalSize":302}
{"time":"2022-05-23T00:50:14.267Z","cid":"api","channel":"output:DistWorker","level":"info","message":"sending unblocked","since":1653267014,"endpoint":{"host":"10.0.2.4","port":4200,"tls":false}}

mikeylee · March 2023

OK figured it out. So the default Auth token "criblmaster" needs to be replaced with a proper one. Clicking the generate button on the Distributed Settings > Leader Settings page and using this new token in the /opt/cribl/local/_system/instance.yml file on the worker node did the trick!

mikeylee · March 2023

Sorry, the messages are these:

{"time":"2022-05-23T01:35:09.958Z","cid":"api","channel":"output:DistWorker","level":"info","message":"attempting to connect","host":"10.0.2.4","port":4200,"tls":false}
{"time":"2022-05-23T01:35:09.959Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"will retry to connect","nextConnectTime":1653269713683}
{"time":"2022-05-23T01:35:09.959Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"connecting","host":"10.0.2.4","port":4200,"tls":false}
{"time":"2022-05-23T01:35:09.962Z","cid":"api","channel":"input:DistMaster","level":"debug","message":"opened connection","src":"10.0.2.4:4200"}
{"time":"2022-05-23T01:35:09.962Z","cid":"api","channel":"output:DistWorker","level":"info","message":"connected","host":"10.0.2.4","port":4200,"tls":false}
{"time":"2022-05-23T01:35:09.962Z","cid":"api","channel":"output:DistWorker","level":"info","message":"flushing buffer backlog","count":1,"totalSize":240}
{"time":"2022-05-23T01:35:09.964Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"will retry to connect","nextConnectTime":1653269711826}
{"time":"2022-05-23T01:35:09.965Z","cid":"api","channel":"input:DistMaster","level":"debug","message":"closed connection","src":"10.0.2.4:4200","error":{"message":"write EPIPE","stack":"Error: write EPIPE\n    at afterWriteDispatched (internal/stream_base_commons.js:156:25)\n    at writeGeneric (internal/stream_base_commons.js:147:3)\n    at Socket._writeGeneric (net.js:798:11)\n    at Socket._write (net.js:810:8)\n    at writeOrBuffer (internal/streams/writable.js:358:12)\n    at Socket.Writable.write (internal/streams/writable.js:303:10)\n    at y.writeAndFlush (/opt/cribl/bin/cribl.js:14:12771977)\n    at y.sendNextBuffer (/opt/cribl/bin/cribl.js:14:12772984)\n    at Immediate._onImmediate (/opt/cribl/bin/cribl.js:14:12772665)\n    at processImmediate (internal/timers.js:464:21)"},"r":0,"b":0}
{"time":"2022-05-23T01:35:10.972Z","cid":"api","channel":"output:DistWorker","level":"warn","message":"sending is blocked","since":1653269709,"elapsed":1,"endpoint":{"host":"10.0.2.4","port":4200,"tls":false}}

mikeylee · March 2023

Another variation is this…

{"time":"2022-05-23T01:35:08.073Z","cid":"api","channel":"output:DistWorker","level":"info","message":"attempting to connect","host":"10.0.2.4","port":4200,"tls":false}
{"time":"2022-05-23T01:35:08.073Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"will retry to connect","nextConnectTime":1653269711797}
{"time":"2022-05-23T01:35:08.073Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"connecting","host":"10.0.2.4","port":4200,"tls":false}
{"time":"2022-05-23T01:35:08.075Z","cid":"api","channel":"input:DistMaster","level":"debug","message":"opened connection","src":"10.0.2.4:4200"}
{"time":"2022-05-23T01:35:08.076Z","cid":"api","channel":"output:DistWorker","level":"info","message":"connected","host":"10.0.2.4","port":4200,"tls":false}
{"time":"2022-05-23T01:35:08.076Z","cid":"api","channel":"output:DistWorker","level":"info","message":"flushing buffer backlog","count":1,"totalSize":240}
{"time":"2022-05-23T01:35:08.095Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"will retry to connect","nextConnectTime":1653269709957}
{"time":"2022-05-23T01:35:08.176Z","cid":"api","channel":"output:DistWorker","level":"error","message":"connection error","error":"This socket has been ended by the other party"}

mikeylee · March 2023

OK figured it out. So the default Auth token "criblmaster" needs to be replaced with a proper one. Clicking the generate button on the Distributed Settings > Leader Settings page and using this new token in the /opt/cribl/local/_system/instance.yml file on the worker node did the trick!

Martin Prado · March 2023

Glad you were able to address this issue. From the messages that you posted from the Worker, this is the one (see below) that points to the issue. There are other scenarios where you could experience this message as well and validating the ports, connection, and reviewing the tcpdump output will help narrow it down.

{"time":"2022-05-23T01:35:08.176Z","cid":"api","channel":"output:DistWorker","level":"error","message":"connection error","error":"This socket has been ended by the other party"}

Problem with Worker connecting to Leader node

Best Answer

Answers

Categories