[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Unable to correctly create multi-machine pool



Hello,

I am trying to run a this Pegasus workflow for an experiment I am running. In order to run the workflow, I was trying to create a multi-machine condor pool using the instructions in the documentation from here. Whenever I run through the commands on the webpage and get to the point where I run condor_status on the submit node. I am getting the following error. 

Error: communication error
SECMAN:2007:Failed to end classad message.  

I am very new to HTCondor so any advice to help me get my multi machine pool running would be greatly appreciated.

I am creating this multi-machine pool using cloud lab. Each node is a m510 machine running ubuntu 22.04.02 LTS. The machines are all connected to the same network and each node has a hostname node{num}. I made node0 the central manager, node1 the submit node, and node2/node3 execute nodes. The commands I ran to create the multi-machine pool were:


$ curl -fsSL https://get.htcondor.org | sudo GET_HTCONDOR_PASSWORD="$htcondor_password" /bin/bash -s -- --no-dry-run --central-manager node0

$ curl -fsSL https://get.htcondor.org | sudo GET_HTCONDOR_PASSWORD="$htcondor_password" /bin/bash -s -- --no-dry-run --submit node0

$curl -fsSL https://get.htcondor.org | sudo GET_HTCONDOR_PASSWORD="$htcondor_password" /bin/bash -s -- --no-dry-run --execute node0

Thanks,
Vijay