I am trying to use the Condor "Compute On Demand" functionality to run a job, and am having trouble releasing the claim after the job completes. I can request and activate a claim successfully using both the condor_cod tool and the Python bindings Claim class. If I use the Python bindings and still have the Claim object around after the job is finished, I can call claim.release() to successfully release the claim.Â
However, I run into issues trying to release the claim later from a separate process. If I try to release the claim using the condor_cod tool I get the following error:Â
dev-exechost-01 [no job set:~] 149% condor_cod release -id "<100.110.25.193:9618>#1563927560#415#cd7bebdd59491a01553804b1cda5cb86939bc09f"
Attempt to send CA_RELEASE_CLAIM to startd <100.110.25.193:9618> failed
AUTHENTICATE:1002:Failure performing handshake
And I see the following the inÂSharedPortLog on that host in response:
10/09/19 14:58:42 DaemonCommandProtocol: Not enough bytes are ready for read.
10/09/19 14:58:42 SharedPortServer: Passing a request from <100.110.25.193:44569> for command 1000 to ID collector.
10/09/19 14:58:42 SharedPortServer: server was busy, failed to connect collector as requested by <100.110.25.193:44569>: primary (fc41ae4b192bf846de08119c9a81c47579587046fc0ee86597e574a317a5e71b/collector): Connection refused (111); alt (/opt/condor/lock/condor/daemon_sock/collector): Connection refused (111)
If I try to use the Python bindings later, I have trouble re-creating the Claim object in a way that allows releasing the COD claim. (This is the only StartdÂin this dev pool, and I had created several other COD claims on it beforehand that had not been released.)
>>> import htcondor
>>> col = htcondor.Collector()
>>> startds = col.query(htcondor.AdTypes.Startd)
>>> private_startds = col.query(htcondor.AdTypes.StartdPrivate)
>>> claim = htcondor.Claim(private_startds)
Traceback (most recent call last):
Â File "<stdin>", line 1, in <module>
RuntimeError: Startd failed to release claim.
>>> claim = htcondor.Claim(startds)
Is there a way to initialize a new Claim object for an existing COD claim, so I can release it? Or is there a better way of doing this?
I'd appreciate any feedback.
Collin Mehring | PE-JoSE - Software Engineer