[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Abnormal Termination received Signal 11

Hi ,

I have two jobs. One of the jobs run fine when submitted to Condor. 
However, the other job receives a signal 11. I ran the other job 
manually on the same machine where Condor scheduled it (worker node) and 
the job runs fine for the same input. It also runs fine for the same 
input in the local machine I am using to submit the jobs. Moreover, if I 
submit the job through globus ( universe = globus ), the job runs fine.  
However I have other issues with globus universe and hence, need to use 
vanilla universe.

I notice that the job some times won't start. It sometimes runs a bit 
and get a signal 11 and exits.

According to an earlier post in this list, different library versions 
can be a potential reason. However, I manually verified the versions of 
the libraries on the worker node and my local machine and they are 
identical. LD_LIBRARY_PATH is not set on both  machines and hence 
doesn't make a difference.
Since the code is running fine for the same input, I believe the signal 
11 is due to environment the job runs in. Other than the library 
versions, I don't know what to check for. Are there any other 
environment variables or condor-specific settings missing that need to 
be set in the submission file?
Do let me know if the problem is not clear or if you need any details. I 
will appreciate any tips/pointers.