[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Baby steps toward Parallel Universe



Hi All,

I'm trying to get parallel Unverse jobs to run under 7.2  for staters
I'm just trying to sleep:

Universe = parallel
# only send email if there's an error
Notification = Error
Executable = /bin/sleep
Arguments = 30
machine_count = 4
queue

I've configured an number of execute nodes with:

condor_status -constraint 'DedicatedScheduler == "DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxxxxxx"' -total

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX    87     0      31        54       2          0        0

               Total    87     0      31        54       2          0        0

on theses systems START is set to TRUE and all the preempt and
suspend related macros have '&& ( MY.NiceUser == True)' at the end so
only NiceUser jobs get interrupted.

The jobs just site Idle in the queue despite the unclaimed systems
shown above and -analyze shows:

-- Submitter: borg-login-1.csail.mit.edu : <128.30.112.26:58458> : borg-login-1.csail.mit.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
---
32234.000:  Run analysis summary.  Of 263 machines,
     26 are rejected by your job's requirements
     17 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
    159 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
     61 are available to run your job


Any clues where I'm going off into the weeds on this one?

Thanks,
-Jon