[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] RuntimeError: Failed to commmit and disconnect from queue.



Hello,

I use Python with HTCondor, and when I submit the following task, everything works fine:

sub = htcondor.Submit({
"executable": "/bin/echo",
"arguments": "hello_world",
"universe": "vanilla",
"should_transfer_files": "NO",
"transfer_executable": "False",
"output": "stdout.txt",
"initialdir": "/tmpdir",
"run_as_owner": "True",
"+Owner": classad.quote("user"),
})
with schedd.transaction() as schedd_transaction:
  cluster_id = sub.queue(schedd_transaction)

But then I add another line inside the 'with' _expression_ (and put 'import time' at the beginning of the file).ÂIt turns out the following:
with schedd.transaction() as schedd_transaction:
 Âcluster_id = sub.queue(schedd_transaction)
 Âtime.sleep(30)

So the last code does not work (of course, I did import time at the beginning of the file.), the error is:
    with schedd.transaction() as schedd_transaction:
      cluster_id = sub.queue(schedd_transaction)
> Â Â Â Â Â time.sleep(30)
E Â Â Â Â Â RuntimeError: Failed to commmit and disconnect from queue.

The question is, why is this error happening? And how does this relate toÂNEGOTIATOR_INTERVAL setting? (Because 30 seconds will attach to the error when the setting is equal to 60 (default), and time.sleep(1) leads to the error when NEGOTIATOR_INTERVAL=5.)

Log:
08/14/19 17:38:45 (fd:5) (pid:25536) (D_NETWORK) condor_write(fd=4 schedd at <192.168.128.5:9618>,,size=13,timeout=0,flags=0,non_blocking=0)
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4a60 resetting
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4a60 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4a60 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4a60 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4ac0 resetting
08/14/19 17:38:45 (fd:5) (pid:25536) (D_NETWORK) condor_read(fd=4 schedd at <192.168.128.5:9618>,,size=5,timeout=0,flags=0,non_blocking=0)
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4ac0 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_ALWAYS:2) condor_read(): Socket closed when trying to read 5 bytes from schedd at <192.168.128.5:9618>
08/14/19 17:38:45 (fd:5) (pid:25536) (D_ALWAYS:2) IO: EOF reading packet header
08/14/19 17:38:45 (fd:5) (pid:25536) (D_NETWORK) Stream::get(int) failed to read padding
08/14/19 17:38:45 (fd:5) (pid:25536) (D_NETWORK) condor_write(fd=4 schedd at <192.168.128.5:9618>,,size=13,timeout=0,flags=0,non_blocking=0)
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4a40 resetting
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4a40 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4a40 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4a40 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_ALWAYS) condor_write() failed: send() 13 bytes to schedd at <192.168.128.5:9618> returned -1, timeout=0, errno=32 Broken pipe.
08/14/19 17:38:45 (fd:5) (pid:25536) (D_ALWAYS) Buf::write(): condor_write() failed
08/14/19 17:38:45 (fd:5) (pid:25536) (D_NETWORK) condor_write(fd=4 schedd at <192.168.128.5:9618>,,size=13,timeout=0,flags=0,non_blocking=0)
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4b10 resetting
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4b10 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4b10 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4b10 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_ALWAYS) condor_write() failed: send() 13 bytes to schedd at <192.168.128.5:9618> returned -1, timeout=0, errno=32 Broken pipe.
08/14/19 17:38:45 (fd:5) (pid:25536) (D_ALWAYS) Buf::write(): condor_write() failed
08/14/19 17:38:45 (fd:5) (pid:25536) (D_NETWORK) CLOSE TCP <192.168.128.2:43967> fd=4

--
Sincerely yours,
Ivan Ergunov                         mailto:hozblok@xxxxxxxxx