[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problems when submitting very large numbers of jobsto the queue

We're using Condor 6.6.7 on a Cluster of ~20 3Ghz Windows XP machines.

One of our users has encountered the situation that if he simultaneously
submits say 500+ jobs to the queue then the scheduling/matching process
appears to fail. Unclaimed machines will enter the 'Matched' state, but the
match will nearly always time out (according to the startd logs on the
respective machines) before the job can be started. The machine then returns
to the 'Unclaimed' state. 

If he submits say, only 50 jobs at a time, the scheduling/matching process
works without a hitch.

Is there a limit to the number of jobs that can be reliably queued ??

So far I've not been able to gain any insight from the manual. Any
suggestions/hints much appreciated ...