[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] have people seen scalability issues with condor submission using the python bindings?



I have not tried python, and not that version of Condor, but I have seen in the past where too many jobs submitted at once has overwhelmed Condor job submission.

bob

On 8/21/2018 11:38 AM, Jose Caballero wrote:
2018-08-21 11:28 GMT-04:00 Jose Caballero <jcaballero.hep@xxxxxxxxx>:
Hi,

I am observing what I believe are some scale problems trying to submit
using the python bindings.
Version of condor is 8.6.12
My application has multiple threads, and when they all try to submit
almost at the same time, using the same Schedd, around 40% of them
succeed and ~60% fail.
I know I should write the code smarter, maybe some thread locking, or
similar trick.
But, in any case, I am just wondering if people have observed a
similar behavior. And, in that case, how they fixed it.

Cheers,
Jose
I think I forgot to include the error message :)

     with self.schedd.transaction() as txn:
RuntimeError: Failed to connect to schedd.

where self.schedd is an instance of htcondor.Schedd()
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/