[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor 7.6 - Windows Parallel Universe problems



I investigated recently a problem I reported a while ago on this list
but got no reply (so I was probably the only one experiencing it ;-).
However, I want report the solution here just in case someone else
stumbles across it.

The problematic setup: 
I use a pool of Windows machines dedicated to Condor. The pool runs a
Central Manager, a Schedd, and a credd and the users all submit from
external Windows client machines to the pool's schedd using the
"-remote" option for condor_submit. I've enabled PASSWORD authentication
on the pool which might be part of the problem. 
As long as the "vanilla" universe is used everything works nicely. But
if one submits a job to the "parallel" universe the job is started but
after it is finished the shadow gives the error message 

01/20/12 16:39:36 (80.0) (4436): SetEffectiveOwner(FelixWolfheimer)
failed with errno=13: Permission denied.
01/20/12 16:39:36 (80.0) (4436): Failed to perform final update to job queue!
  
and the job is rescheduled and runs into the same problem in the end, is rescheduled again, etc.

Solution: I found out that I had to add the dummy(?) account "condor_pool" to the 

QUEUE_SUPER_USERS

in the condor config file on the machine running the schedd of the pool.

Actually, this seems not very obvious to me and I wonder whether this is the intended behaviour?!

Anyway, now the parallel jobs run fine and just do what they are supposed to do. :-)