We have updated our Terms of Service, Code of Conduct, and Addendum.

Problem with Worker connecting to Leader node

Options

Hi Guys,

I have setup a test environment on my laptop using VBOX and with a Centos7 edge leader node and worker using ports 9000(ui) and 4200 for internal comms.

I’ve installed the worker node using curl. Seems there is an issue with the worker node connecting to the leader as shown below. From the worker node, telnet to the leader nodes port 9000 and 4200 connect fine. But not sure what additional stuff to check based on the below messages:

{"time":"2022-05-23T00:50:14.256Z","cid":"api","channel":"output:DistWorker","level":"info","message":"attempting to connect","host":"10.0.2.4","port":4200,"tls":false}
{"time":"2022-05-23T00:50:14.256Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"will retry to connect","nextConnectTime":1653267018346}
{"time":"2022-05-23T00:50:14.256Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"connecting","host":"10.0.2.4","port":4200,"tls":false}
{"time":"2022-05-23T00:50:14.264Z","cid":"api","channel":"input:DistMaster","level":"debug","message":"opened connection","src":"10.0.2.4:4200"}
{"time":"2022-05-23T00:50:14.264Z","cid":"api","channel":"output:DistWorker","level":"info","message":"connected","host":"10.0.2.4","port":4200,"tls":false}
{"time":"2022-05-23T00:50:14.264Z","cid":"api","channel":"output:DistWorker","level":"info","message":"flushing buffer backlog","count":1,"totalSize":302}
{"time":"2022-05-23T00:50:14.267Z","cid":"api","channel":"output:DistWorker","level":"info","message":"sending unblocked","since":1653267014,"endpoint":{"host":"10.0.2.4","port":4200,"tls":false}}

Best Answer

  • mikeylee
    mikeylee Posts: 6
    Answer ✓
    Options

    OK figured it out. So the default Auth token "criblmaster" needs to be replaced with a proper one. Clicking the generate button on the Distributed Settings > Leader Settings page and using this new token in the /opt/cribl/local/_system/instance.yml file on the worker node did the trick!

Answers

  • mikeylee
    mikeylee Posts: 6
    Options

    Sorry, the messages are these:

    {"time":"2022-05-23T01:35:09.958Z","cid":"api","channel":"output:DistWorker","level":"info","message":"attempting to connect","host":"10.0.2.4","port":4200,"tls":false}
    {"time":"2022-05-23T01:35:09.959Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"will retry to connect","nextConnectTime":1653269713683}
    {"time":"2022-05-23T01:35:09.959Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"connecting","host":"10.0.2.4","port":4200,"tls":false}
    {"time":"2022-05-23T01:35:09.962Z","cid":"api","channel":"input:DistMaster","level":"debug","message":"opened connection","src":"10.0.2.4:4200"}
    {"time":"2022-05-23T01:35:09.962Z","cid":"api","channel":"output:DistWorker","level":"info","message":"connected","host":"10.0.2.4","port":4200,"tls":false}
    {"time":"2022-05-23T01:35:09.962Z","cid":"api","channel":"output:DistWorker","level":"info","message":"flushing buffer backlog","count":1,"totalSize":240}
    {"time":"2022-05-23T01:35:09.964Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"will retry to connect","nextConnectTime":1653269711826}
    {"time":"2022-05-23T01:35:09.965Z","cid":"api","channel":"input:DistMaster","level":"debug","message":"closed connection","src":"10.0.2.4:4200","error":{"message":"write EPIPE","stack":"Error: write EPIPE\n    at afterWriteDispatched (internal/stream_base_commons.js:156:25)\n    at writeGeneric (internal/stream_base_commons.js:147:3)\n    at Socket._writeGeneric (net.js:798:11)\n    at Socket._write (net.js:810:8)\n    at writeOrBuffer (internal/streams/writable.js:358:12)\n    at Socket.Writable.write (internal/streams/writable.js:303:10)\n    at y.writeAndFlush (/opt/cribl/bin/cribl.js:14:12771977)\n    at y.sendNextBuffer (/opt/cribl/bin/cribl.js:14:12772984)\n    at Immediate._onImmediate (/opt/cribl/bin/cribl.js:14:12772665)\n    at processImmediate (internal/timers.js:464:21)"},"r":0,"b":0}
    {"time":"2022-05-23T01:35:10.972Z","cid":"api","channel":"output:DistWorker","level":"warn","message":"sending is blocked","since":1653269709,"elapsed":1,"endpoint":{"host":"10.0.2.4","port":4200,"tls":false}}
    
  • mikeylee
    mikeylee Posts: 6
    Options

    Another variation is this…

    {"time":"2022-05-23T01:35:08.073Z","cid":"api","channel":"output:DistWorker","level":"info","message":"attempting to connect","host":"10.0.2.4","port":4200,"tls":false}
    {"time":"2022-05-23T01:35:08.073Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"will retry to connect","nextConnectTime":1653269711797}
    {"time":"2022-05-23T01:35:08.073Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"connecting","host":"10.0.2.4","port":4200,"tls":false}
    {"time":"2022-05-23T01:35:08.075Z","cid":"api","channel":"input:DistMaster","level":"debug","message":"opened connection","src":"10.0.2.4:4200"}
    {"time":"2022-05-23T01:35:08.076Z","cid":"api","channel":"output:DistWorker","level":"info","message":"connected","host":"10.0.2.4","port":4200,"tls":false}
    {"time":"2022-05-23T01:35:08.076Z","cid":"api","channel":"output:DistWorker","level":"info","message":"flushing buffer backlog","count":1,"totalSize":240}
    {"time":"2022-05-23T01:35:08.095Z","cid":"api","channel":"output:DistWorker","level":"debug","message":"will retry to connect","nextConnectTime":1653269709957}
    {"time":"2022-05-23T01:35:08.176Z","cid":"api","channel":"output:DistWorker","level":"error","message":"connection error","error":"This socket has been ended by the other party"}
    
  • mikeylee
    mikeylee Posts: 6
    Answer ✓
    Options

    OK figured it out. So the default Auth token "criblmaster" needs to be replaced with a proper one. Clicking the generate button on the Distributed Settings > Leader Settings page and using this new token in the /opt/cribl/local/_system/instance.yml file on the worker node did the trick!

  • Martin Prado
    Martin Prado Posts: 27 mod
    Options

    Glad you were able to address this issue. From the messages that you posted from the Worker, this is the one (see below) that points to the issue. There are other scenarios where you could experience this message as well and validating the ports, connection, and reviewing the tcpdump output will help narrow it down.

    {"time":"2022-05-23T01:35:08.176Z","cid":"api","channel":"output:DistWorker","level":"error","message":"connection error","error":"This socket has been ended by the other party"}