[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor_preen deleting all waiting jobs

I'm facing a strange problem with condor_preen (7.6.6). My users are
submitting jobs using the "condor_submit -remote" command to send their
jobs to a Linux cluster (RHEL 5). I noticed that all files in the spool
directories of the jobs get deleted after a while by condor_preen. This
happens to the jobs of all users EXCEPT for my own jobs. When I execute
condor_preen manually with highest debug level I can see that the spool
directories which contain the waiting jobs of my users (except my own)
are marked as "BAD". Here is an excerpt from the condor_preen output:

--snip-- One of my own waiting jobs
/condor.local/spool/385/0/cluster385.proc0.subproc0.tmp - OK
/condor.local/spool/385/0/cluster385.proc0.subproc0 - OK
/condor.local/spool/385/0 - OK
/condor.local/spool/385 - OK

--snip-- One of my users waiting jobs
/condor.local/spool/411/0 - OK
/condor.local/spool/411 - OK
/condor.local/spool/411/0/cluster411.proc0.subproc0.tmp - BAD
/condor.local/spool/411/0/cluster411.proc0.subproc0 - BAD

I looked into the source of condor_preen and it seems that Condor checks
whether the cluster ids/job ids are belong to some job in the queue. The
queue on the machine contains the jobs in state "I" as far as I can see
with condor_q and I can see no difference as compared to my own jobs
(except for the "Owner" property of course). As it might be somehow
related to my access rights to condor (I suppose), here are my security
settings (I have admin and owner access while the other users haven't, I
replaced the hostnames/IPs with "+++").  Any ideas?

02/10/12 15:49:10 (fd:2) (pid:21169) Initialized the following
authorization table:
02/10/12 15:49:10 (fd:2) (pid:21169) Authorizations yet to be resolved:
02/10/12 15:49:10 (fd:2) (pid:21169) allow READ:  */*.cst.de
*/ */GPUwork01.dmz1.cst.de
02/10/12 15:49:10 (fd:2) (pid:21169) allow WRITE:  */*.cst.de
*/ */GPUwork01.dmz1.cst.de
02/10/12 15:49:10 (fd:2) (pid:21169) allow NEGOTIATOR:  */*.cst.de
02/10/12 15:49:10 (fd:2) (pid:21169) allow ADMINISTRATOR:
felix/*.cst.de root/+++ root/+++
02/10/12 15:49:10 (fd:2) (pid:21169) allow OWNER:  felix/*.cst.de
root/ root/+++
02/10/12 15:49:10 (fd:2) (pid:21169) allow CONFIG:  felix/*.cst.de
root/ root/+++
02/10/12 15:49:10 (fd:2) (pid:21169) allow DAEMON:  */*.cst.de
02/10/12 15:49:10 (fd:2) (pid:21169) allow ADVERTISE_STARTD:  */*.cst.de
02/10/12 15:49:10 (fd:2) (pid:21169) allow ADVERTISE_SCHEDD:  */*.cst.de
02/10/12 15:49:10 (fd:2) (pid:21169) allow ADVERTISE_MASTER:  */*.cst.de