[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] wait times



Our condor cluster has about 720 slots.  Our servers are dedicated to condor.

We have several schedulers running.  When I submit a test job from one submit node my condor process runs almost immediately.

Submitted from another node the process runs for in excess of 30 minutes.

Nothing in the logs shows a problem.  My inclination is that this is related to the processing node running the job and not the submit node.

>From submit node slow I can see the job is sent to processing node 21 where it seems to just sit.

>From submit node fast the job goes to processing node 130 where it runs immediately.

Given that each 'node' contains 12 slots (one for each cpu) does condor assign priority is some way?  On submit node 21 would it make slot2 wait for slot1 to complete before running?

Any pointers to debug this would be appreciated!

--Don
Florida State University HPC