[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Abnormal Termination received Signal 11



Hi ,

I have two jobs. One of the jobs run fine when submitted to Condor. However, the other job receives a signal 11. I ran the other job manually on the same machine where Condor scheduled it (worker node) and the job runs fine for the same input. It also runs fine for the same input in the local machine I am using to submit the jobs. Moreover, if I submit the job through globus ( universe = globus ), the job runs fine. However I have other issues with globus universe and hence, need to use vanilla universe.

I notice that the job some times won't start. It sometimes runs a bit and get a signal 11 and exits.

According to an earlier post in this list, different library versions can be a potential reason. However, I manually verified the versions of the libraries on the worker node and my local machine and they are identical. LD_LIBRARY_PATH is not set on both machines and hence doesn't make a difference. Since the code is running fine for the same input, I believe the signal 11 is due to environment the job runs in. Other than the library versions, I don't know what to check for. Are there any other environment variables or condor-specific settings missing that need to be set in the submission file? Do let me know if the problem is not clear or if you need any details. I will appreciate any tips/pointers.

Thanks,
Vinai