[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] invalid sessions causing job preemption



At 04:13 PM 2/9/2004, Scott Koranda wrote:
> At 03:42 PM 2/9/2004, Scott Koranda wrote:
> >Hi,
> >
> >On a Condor 6.6.0 pool running on top of RedHat 9 we are seeing
> >the following in some of the StartLogs:
>
>
> For starters:
> Are you positive that the schedd that had the startd claimed was also
> v6.6.0?  Or is it possible that the schedd was an older version of Condor?

Yes, it does look like the schedd was from a 6.5.x. So I assume that

http://www.cs.wisc.edu/~lists/archive/condor-users/msg00534.html

applies.

Sure. Or even more to the root of the trouble, I'd say that
http://www.cs.wisc.edu/~lists/archive/condor-users/msg00553.html
applies. If your installation does not require Condor to perform secure communication channels (i.e. IP based authorization is good enough for your setup), then I'd recommend setting
SEC_DEFAULT_NEGOTIATION = NEVER
in all of your condor_config files and thereby bypass any/all issues dealing with communication session keys.



Sorry for the trouble...

No trouble at all.


best regards Scott,
Todd


Scott

>
> Thanks,
> Todd
>
>
> >2/9 10:33:37 DC_AUTHENTICATE: attempt to open invalid session
> >node34:6631:1076347765:1271, failing.
> >2/9 10:38:38 DC_AUTHENTICATE: attempt to open invalid session
> >node34:6631:1076347765:1271, failing.
> >2/9 10:43:37 State change: claim timed out (condor_schedd gone?)
> >2/9 10:43:37 Changing state and activity: Claimed/Busy ->
> >Preempting/Killing
> >2/9 10:43:38 Got ALIVE while in Preempting state, ignoring.
> >2/9 10:43:49 DC_AUTHENTICATE: attempt to open invalid session
> >node34:6631:1076347835:1272, failing.
> >2/9 10:43:49 Starter pid 32656 exited with status 0
> >2/9 10:43:49 State change: starter exited
> >2/9 10:43:49 State change: No preempting claim, returning to owner
> >2/9 10:43:49 Changing state and activity: Preempting/Killing -> Owner/Idle
> >2/9 10:43:49 State change: IS_OWNER is false
> >2/9 10:43:49 Changing state: Owner -> Unclaimed
> >
> >So it looks like some problem with an invalid session is causing jobs
> >to be preempted.
> >
> >Can you tell us what can cause these problems with invalid sessions?
> >
> >Thanks,
> >
> >Scott
> >
> >Condor Support Information:
> >http://www.cs.wisc.edu/condor/condor-support/
> >To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
> >unsubscribe condor-users <your_email_address>
>
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Todd Tannenbaum                       University of Wisconsin-Madison
> Condor Project Research               Department of Computer Sciences
> tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #3357
> http://www.cs.wisc.edu/~tannenba      Madison, WI 53706-1685
> Phone: (608) 263-7132  FAX: (240) 359-5654
>
> Condor Support Information:
> http://www.cs.wisc.edu/condor/condor-support/
> To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
> unsubscribe condor-users <your_email_address>
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>


-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Todd Tannenbaum University of Wisconsin-Madison
Condor Project Research Department of Computer Sciences
tannenba@xxxxxxxxxxx 1210 W. Dayton St. Rm #3357
http://www.cs.wisc.edu/~tannenba Madison, WI 53706-1685
Phone: (608) 263-7132 FAX: (240) 359-5654


Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>