[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Why was job evicted?



On Wed, Sep 17, 2003 at 10:44:40AM -0500, Tom G. Smith (Smitty) wrote:
> 1.	Why did Joe Blow's job get evicted?  I expected that
> 	my condor_config file on runhost would prevent any job from
> 	being evicted, once it got initiated.

it doesn't look like a config problem.  from the logs you sent, i
think these lines in the StartLog shed some light on the problem:

> 	9/16 22:25:50 vm1: State change: claim timed out (condor_schedd gone?)
> 	9/16 22:25:50 vm1: Changing state and activity: Claimed/Busy -> Preempting/Killing

it seems that the condor_startd (the daemon that runs your jobs) lost contact
with the condor_schedd (the daemon that manages the job queue) and killed the
job.

which of course begs the question, why did they lose contact?

this appears to be related to another of your questions:

> 4.	What causes the messages like "DC_AUTHENTICATE: attempt to open
> 	invalid session runhost:13840:1063767050:316, failing" that I
> 	see scattered throughout my log?

it seems that the schedd is attempting to communicate with the startd but
is being ignored because it is trying to restart an invalid session.  this
problem should be handled automatically by condor, but there was a bug in
the 6.4 series that prevented this from happening after a session timed
out, which is probably what you are seeing.  this bug was fixed in 6.5.0.
if you _are_ using the 6.5 series, then it's something new.  either way,
there are a few things you can do:

1)
condor_reconfig -schedd

this will cause the schedd to throw away it's session information and
stop using the invalid session.  not a great solution, because the new
session it creates will also expire eventually.


2)
increase the session length in your condor_config (it's in seconds, and
the default is one hour):
  DEFAULT_SESSION_DURATION = 864000 # (10 days)

also not a great solution, because it still expires _eventually_.


3)
if you aren't using kerberos or gsi authentication, encryption, or md5
integrity checks, you can disable the use of sessions altogether with
this line in your condor_config:
  SEC_DEFAULT_NEGOTIATION = NEVER


also, just so i have better information, could you please tell me what version
of condor you were using, and if you had changed any of the condor_config
parameters starting with SEC_?


cheers,
-zach

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>