[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Standard universe - Job submission... Problem



Hello,

On Mon, Mar 05, 2007 at 09:43:45AM -0600, Sridhar Karra wrote:
> I am having issues submitting jobs (Standard Universe) on condor pool.
> Vanilla jobs runs fine with no problem. Any standard universe job just
> sits in the queue and jump between Idle and Running stages.
> 
> Any simple job even the "printf("hello world");" submitted via standard
> universe fails to execute. I did compile the job using condor_compile
> before i submit it to the pool.
> 
> My question is:
> 1. Do I have to change any configurations (global or local)?
> 2. Am I doing something wrong? or Am I missing something here?

There are a few things you can look at, such as did you mispell:
universe = standard
in your submit description file?

If not, the best way to debug it is thus:

1. In the local config file for the submit machine add D_SYSCALLS and
D_FULLDEBUG to the SHADOW_DEBUG macro.

2. For your execute machines, add D_FULLDEBUG to STARTER_DEBUG.

Then, resubmit the job. (or you can just fix the debugging on one execute
machine and force the job to run there)

When it fails, check out the starter log on the execute node where the
job ended up running, you'll probably see the cause of the error right
away....unless the logfile just ends abruptly, in which case the starter
segfaulted and we definitely need to check that out.

The shadow log will record how far along it got while talking to the starter.

Thank you.

-pete