[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job gets held: Reason unspecified



Hi,

I just found something stange, maybe this is the reason the the jobs being held. The version of condor used is 6.7.8:


[dietz@hydra Run1]$ condor_version
$CondorVersion: 6.7.8 Jun  9 2005 $
$CondorPlatform: I386-LINUX_RH9 $

But in the ShadowLog it says something about version 6.7.12:

12/28 12:28:57 (601662.0) (9946):Read: User Job - $CondorPlatform: I386-LINUX_RH9 $
12/28 12:28:57 (601662.0) (9946):Read: User Job - $CondorVersion: 6.7.12 Sep 24 2005 $
12/28 12:28:57 (601662.0) (9946):ERROR: User job is NOT compatible with this shadow version

Maybe on the nodes is another version installed as on the headnode or something like this?
Paul: Can you check this please?


Regards
Alex



dietz@xxxxxxxxxxxx wrote:
Hi,

the issue with the unknown reason for holding condor jobs is still not resolved. I just checked the execuatbles and they are condor_compiled. And I still do not know why they get hold.

Regards
Alex


On Wed, 21 Dec 2005 14:46:40 -0600, Alexander Dietz wrote

> Erik Paulson wrote:
>
On Wed, Dec 21, 2005 at 02:31:37PM -0600, Alexander Dietz 

wrote:

  

    
Hi,



in the ShadowLog it 

says:



12/21 14:18:42 (601662.0) (8918):ERROR: User job is NOT compatible with 



this shadow 

version



What does this mean? I ran very similar jobs on the same cluster some 



zillion times before, and in moste cases it worked out. Any 

ideas?



    

      
Are you running a standard universe 

job?
yes its the standard universe
>
 Did you use Condor 6.7 



for the condor_compile step, but submit from a machine running 

Condor

6.6?  Condor can't do that, because the older 6.6 shadow may not 

know

how to handle the system calls a 6.7 job would 

make.

> According to 'condor_version' its  version 6.7.8
>
-Erik

  
    
Alex







Matt Hope 

wrote:



    

      
you look in the SchedLog for entries about 475473.0? The 

schedd will 

log

when it puts jobs on hold, even if it doesn't update the 

job.

    





          

            
      
        
where are the 

SchedLog's?

  





        

          
On your submit machine 

run



condor_config_val 

LOG



This will output the path to the daemon logs. Look in SchedLog and 

go

with what Erik said by looking for any mention relating to those 

jobs.

You may also wish to look in the ShadowLog just in 

case.



An exit status of 112 from the shadow indicates that the schedd 

should

put the job on hold (which it is doing) so there might be something 

in

there.



Also supply the submit script text just incase there are any 

periodic

expressions that might indicate 

it



Matt



_______________________________________________

Condor-users mailing 

list

Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

      
        
  

    
_______________________________________________

Condor-users mailing 

list

Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
    
      
_______________________________________________

Condor-users mailing 

list

Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
  
    



--
Open WebMail Project (http://openwebmail.org)


_______________________________________________ Condor-users mailing list Condor-users@xxxxxxxxxxx https://lists.cs.wisc.edu/mailman/listinfo/condor-users