[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job gets held: Reason unspecified



Hi,

this is an older post but again I have this problem and it is still not solved. All of my jobs that has been started are put to hold with 'Reason unspecified'. I checked the SchedLog for one job and here is the relevant part:


12/21 11:26:09 Checking file /mnt/scratch4/dietz/Work/I-S4BBH/Fulldata/Run1/logs/inspiral-L1:LSC-DARM_ERR-793781181-793783229-601662-0.out for write permission.
12/21 11:26:09 Checking file /mnt/scratch4/dietz/Work/I-S4BBH/Fulldata/Run1/logs/inspiral-L1:LSC-DARM_ERR-793781181-793783229-601662-0.err for write permission.
12/21 11:26:13 Job 601662.0: is runnable
12/21 11:26:13 Scheduler::start_std - job=601662.0 on <129.89.200.11:51856>
12/21 11:26:13 Queueing job 601662.0 in runnable job queue
12/21 11:26:13 Match (<129.89.200.11:51856>#1135113073#153) - running 601662.0
12/21 11:26:21 Job prep for 601662.0 will not block, calling aboutToSpawnJobHandler() directly
12/21 11:26:21 aboutToSpawnJobHandler() completed for job 601662.0, attempting to spawn job handler
12/21 11:26:21 Starting add_shadow_birthdate(601662.0)
12/21 11:26:21 Added shadow record for PID 3585, job (601662.0)
12/21 11:26:21 .. 3585, 601662.0, F, <129.89.200.11:51856>, cur_hosts=1, status=2
12/21 11:26:21 Started shadow for job 601662.0 on "<129.89.200.11:51856>", (shadow pid = 3585)
12/21 11:26:21 .. 3585, 601662.0, F, <129.89.200.11:51856>, cur_hosts=1, status=2
12/21 11:26:22 .. 3585, 601662.0, F, <129.89.200.11:51856>, cur_hosts=1, status=2
12/21 11:26:23 .. 3585, 601662.0, F, <129.89.200.11:51856>, cur_hosts=1, status=2
12/21 11:26:23 Shadow pid 3585 for job 601662.0 exited with status 112
12/21 11:26:23 Putting job 601662.0 on hold
12/21 11:26:23 Deleting shadow rec for PID 3585, job (601662.0)


There was another mail on this in this group (11/20), but no response on this.
So I really appreciate any help for that.

Regards
Alexander




Erik Paulson wrote:
On Wed, Nov 02, 2005 at 08:54:55AM -0600, Alexander Dietz wrote:
  
I looked in the log file but it also does not help me in any way:

012 (475473.000.000) 11/01 19:36:21 Job was held.
        Reason unspecified
        Code 0 Subcode 0

    

Hrm. Most places in Condor where the job is held update the user log with
the reason, so you don't usually have to do this, but in this case, can you
look in the SchedLog for entries about 475473.0? The schedd will log when it
puts jobs on hold, even if it doesn't update the job. 

(There are some error cases that I can think of that would cause it to 
not write a reason to the user log, and we'd see those in the sched log
as well)

-Erik
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users