[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[condor-users] DC_AUTHENTICATE error



I have some Vanilla universe jobs that are having problems when
flocking to the Condor cluster at HEP.  Following is an example from
the ShadowLog:

2/3 10:57:59 ******************************************************
2/3 10:57:59 ** condor_shadow (CONDOR_SHADOW) STARTING UP
2/3 10:57:59 ** $CondorVersion: 6.4.7 Jan 26 2003 $
2/3 10:57:59 ** $CondorPlatform: INTEL-LINUX-GLIBC22 $
2/3 10:57:59 ** PID = 27608
2/3 10:57:59 ******************************************************
2/3 10:57:59 DaemonCore: Command Socket at <144.92.101.149:57227>
2/3 10:58:00 Initializing a VANILLA shadow
2/3 10:58:00 (1506.440) (27608): Request to run on <128.104.28.10:32769> was ACCEPTED
2/3 12:18:10 (1506.440) (27608): DC_AUTHENTICATE: attempt to open invalid session condor:27608:1075827490:1, failing.
2/3 12:37:05 (1506.440) (27608): DC_AUTHENTICATE: attempt to open invalid session condor:27608:1075827480:0, failing.
2/3 12:37:06 (1506.440) (27608): ERROR "Can no longer communicate with condor_starter on execute machine" at line 138 in file NTreceivers.C

Apparently, HEP has some kind of timeout after one hour and twenty
minutes of activity.  What parameter controls this?

Is there some some parameter I can set using condor_qedit so that
these jobs don't try flocking to HEP, or do I need to disable flocking
to HEP globally in the config file and do a condor_reconfig?

-- 
Daniel K. Forrest	Laboratory for Molecular and
forrest@xxxxxxxxxxxxx	Computational Genomics
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>