[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Shadow exiting with status 100



On Thu, Jan 08, 2004 at 06:26:04PM +0000, Alexander Klyubin wrote:
> Hello!
> 
> I'm experiencing strange behavior of my Java jobs running on Condor. The 
> jobs run in a different pool via flocking. When a job completes within 
> several minutes everything is fine. When the job runs longer it 
> completes, but no files get transferred back, despite the fact that 
> Condor thinks everything went fine.

okay.

> The submit machine runs Condor 6.5.5, whereas the execute machine and 
> its central manager run Condor 6.6.0. All machines run Linux.
> 
> Can this strange behavior be caused by the fact that execute machine's 
> local time is one hour ahead of the submit machine one?

most likely you've hit the nail right on the head.  after the job finishes,
it connects back to the submit side to transfer files.  it tries to reuse
the same session it had when it was spawned.  in 6.5.5 the default session
duration was 1 hour.  in 6.6.0 it was set to much longer, 100 days.  due to
your clock skew, the session probably expired on one side or the other before
the job finished.

to work around, you can add this to your condor_config:

  SEC_DEFAULT_SESSION_DURATION = 8640000

this change should actually be made for all users of 6.5.X, especially if
you have long-running jobs.  it is not needed in your 6.6.X config files
but it will cause no harm either.


> 1/7 11:24:46 (2765.0) (5383): DC_AUTHENTICATE: attempt to open invalid 
> session klyubin:5383:1073470798:0, failing.
> 1/7 11:24:46 (2765.0) (5383): **** condor_shadow (condor_SHADOW) EXITING 
> WITH STATUS 100

if you are curious, this is the line that clued me in.


cheers,
-zach

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>