[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Failed to Connect



On Jul 28, 2017, at 4:27 AM, Justin Fisher <justin0419@xxxxxxxxx> wrote:

I occasionally get this error. 192.168.1.206 is the machine I use to submit the jobs. I think it's some kind of network issue, but I'm not sure. My work around is to reboot the submit machine, but is there a less drastic method?

I can ping all the other machines on the network and the NFS shares needed for Condor are all there.

ERROR: Failed to connect to local queue manager

This looks like an error message that condor_submit prints.
When this error occurs, does it happen every time, or does condor_submit still work sometimes? Do other commands that talk to the schedd (e.g. condor_q, condor_rm) also fail?

You say you can ping all of the other machines on the network. Can you ping this machine (192.168.1.206) when the errors occur? If the machine is otherwise healthy, you can try restarting just the HTCondor daemons.

Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project