[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Jobs are remaining idle forever



Jobs are remaining idle for long periods of time. Can anyone point me to the problem
 
Here is the output from condor_q -better-analyze
 
-- Submitter: dmvx.vxnet : <192.168.2.101:32843> : dmvx.vxnet
---
422.000:  Run analysis summary.  Of 22 machines,
     11 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
     11 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job
        Last successful match: Wed Sep 17 08:54:12 2008
 
The Requirements _expression_ for your job is:
 
( ( target.eclipse_available > 0 ) &&
( target.LoadAvg < 3.000000000000000E-01 ) ) && ( target.Arch == "X86_64" ) &&
( target.OpSys == "LINUX" ) && ( target.Disk >= DiskUsage ) &&
( ( target.Memory * 1024 ) >= ImageSize ) && ( target.HasFileTransfer )
 
    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( target.eclipse_available > 0 )  16
2   ( target.Arch == "X86_64" )       16
3   ( target.OpSys == "LINUX" )       16
4   ( target.LoadAvg < 3.000000000000000E-01 )17
5   ( target.Disk >= 225000 )         22
6   ( ( 1024 * target.Memory ) >= 0 ) 22
7   ( target.HasFileTransfer )        22
 
This is the log file
 
000 (422.000.000) 09/16 18:19:43 Job submitted from host: <192.168.2.101:32843>
...
 
 
This is from the SchedLog
 
9/17 08:49:15 (pid:22237) Checking consistency running and runnable jobs
9/17 08:49:15 (pid:22237) Tables are consistent
9/17 08:49:15 (pid:22237) Rebuilt prioritized runnable job list in 0.000s.
9/17 08:49:15 (pid:22237) Starting add_shadow_birthdate(422.0)
9/17 08:49:15 (pid:22237) Started shadow for job 422.0 on "<192.168.2.104:40736>", (shadow pid = 6252)
9/17 08:49:15 (pid:22237) Shadow pid 6251 for job 424.0 exited with status 1
9/17 08:49:15 (pid:22237) ERROR: Shadow exited with unknown value 1!
9/17 08:49:15 (pid:22237) Match for cluster 424 has had 5 shadow exceptions, relinquishing.
9/17 08:49:15 (pid:22237) Sent RELEASE_CLAIM to startd at <192.168.2.103:40431>
9/17 08:49:15 (pid:22237) Match record (<192.168.2.103:40431>, 424, 0) deleted
9/17 08:49:15 (pid:22237) Got VACATE_SERVICE from <192.168.2.103:47482>
9/17 08:49:15 (pid:22237) Shadow pid 6252 for job 422.0 exited with status 1
9/17 08:49:15 (pid:22237) ERROR: Shadow exited with unknown value 1!
 
Finally, there is nothing in the Shadow Log
 
 
Thanks
Jeff