[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Job executes, but status of job cannot be changed




I am trying to understand why our jobs do not exit the queue after successful completion. It seems to be related to that the ad owner and the submit owner are different.  Can anyone shed some light on why this is occurring. Both accounts have write permission. If I run the job with the account igskbacb-condoradmin, the job exists. The igskbacb-condoradmin is a service account that the daemons run under. Our Pool is Windows XP only and we are using Condor 7.4.2.

Thanks,
Mike


Schedlog
05/12 07:51:02 (pid:2348) ad owner: odonnellm, queue submit owner: igskbacb-condoradmin
05/12 07:51:02 (pid:2348) OwnerCheck(igskbacb-condoradmin) failed in SetAttribute for job 291.0
05/12 07:51:02 (pid:2348) condor_write(fd=1232 <IP:1901>,,size=37,timeout=20,flags=0)
05/12 07:51:02 (pid:2348) condor_read(fd=1232 <IP:1901>,,size=21,timeout=20,flags=0)
05/12 07:51:02 (pid:2348) condor_read(): fd=1232
05/12 07:51:02 (pid:2348) condor_read(): select returned 1
05/12 07:51:02 (pid:2348) condor_read(fd=1232 <IP:1901>,,size=71,timeout=20,flags=0)
05/12 07:51:02 (pid:2348) condor_read(): fd=1232
05/12 07:51:02 (pid:2348) condor_read(): select returned 1
05/12 07:51:02 (pid:2348) PERMISSION GRANTED to igskbacb-condoradmin@gs from host 159.189.162.39 for queue management, access level WRITE: reason: cached result for WRITE; see first case for the full reason


Shadowlog (submit machine):
05/10 15:35:35 (248.0) (2088): condor_write(fd=1672 schedd at <IP:4905>,,size=595,timeout=300,flags=0)
05/10 15:35:35 (248.0) (2088): SECMAN: resume, other side is $CondorVersion: 7.4.0 Oct 31 2009 BuildID: 193173 $, NOT reauthenticating.
05/10 15:35:35 (248.0) (2088): SECMAN: about to enable message authenticator.
05/10 15:35:35 (248.0) (2088): SECMAN: successfully enabled message authenticator!
05/10 15:35:35 (248.0) (2088): SECMAN: about to enable encryption.
05/10 15:35:35 (248.0) (2088): SECMAN: successfully enabled encryption!
05/10 15:35:35 (248.0) (2088): SECMAN: startCommand succeeded.
05/10 15:35:35 (248.0) (2088): Authorizing server '*/IP'.
05/10 15:35:35 (248.0) (2088): condor_write(fd=1672 schedd at <IP:4905>,,size=76,timeout=300,flags=0)
05/10 15:35:35 (248.0) (2088): condor_read(fd=1672 schedd at <IP:4905>,,size=21,timeout=300,flags=0)
05/10 15:35:35 (248.0) (2088): condor_read(): fd=1672
05/10 15:35:35 (248.0) (2088): condor_read(): select returned 1
05/10 15:35:35 (248.0) (2088): condor_read(fd=1672 schedd at <IP:4905>,,size=16,timeout=300,flags=0)
05/10 15:35:35 (248.0) (2088): condor_read(): fd=1672
05/10 15:35:35 (248.0) (2088): condor_read(): select returned 1
05/10 15:35:35 (248.0) (2088): updateExprTree: Failed SetAttribute(NumJobStarts, 1)
05/10 15:35:35 (248.0) (2088): condor_write(fd=1672 schedd at <IP:4905>,,size=92,timeout=300,flags=0)
... Removed 5 additional tries
05/10 15:35:35 (248.0) (2088): Failed to perform final update to job queue!
05/10 15:35:35 (248.0) (2088): Maximum number of job cleanup retry attempts (SHADOW_MAX_JOB_CLEANUP_RETRIES=5) reached; Forcing job requeue!
05/10 15:35:35 (248.0) (2088): KEYCACHEENTRY: deleted: 00D18648
05/10 15:35:35 (248.0) (2088): KEYCACHEENTRY: deleted: 00D33CD0
05/10 15:35:35 (248.0) (2088): KEYCACHEENTRY: deleted: 00D217A8
05/10 15:35:35 (248.0) (2088): KEYCACHE: deleted: 00B7B6F0
05/10 15:35:35 (248.0) (2088): CLOSE <IP:2880> fd=1716
05/10 15:35:35 (248.0) (2088): CLOSE <127.0.0.1:2881> fd=1248
05/10 15:35:35 (248.0) (2088): CLOSE <127.0.0.1:2882> fd=1732
05/10 15:35:35 (248.0) (2088): **** condor_shadow (condor_SHADOW) pid 2088 EXITING WITH STATUS 107