[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_preen killing negotiator and running jobs



Is the negotiator producing a core file?

On 6/14/2012 10:38 AM, Felix Wolfheimer wrote:
I'm facing a strange problem with condor_preen on Windows (condor
7.6.7). condor_preen seems to crash my negotiator and kills at least
some of the running jobs. This is part of the negotiator log file
after "condor_preen" has cleaned up and these are actually the last
things the negotiator writes before it crashes:

06/14/12 11:33:11 (pid:2236) PERMISSION GRANTED to SYSTEM@nt authority
from host 10.2.10.1 for command 1112 (QMGMT_WRITE_CMD), access level
WRITE: reason: WRITE authorization policy allows IP address 10.2.10.1;
identifiers used for this remote host: 10.2.10.1,XXX.cst.de
06/14/12 11:33:11 (pid:2236) Received TCP command 1112
(QMGMT_WRITE_CMD) from SYSTEM@nt authority <10.2.10.1:1349>, access
level WRITE
06/14/12 11:33:11 (pid:2236) Calling HandleReq <handle_q> (0)
06/14/12 11:33:11 (pid:2236) PERMISSION GRANTED to SYSTEM@nt authority
from host 10.2.10.1 for queue management, access level WRITE: reason:
cached result for WRITE; see first case for the full reason
06/14/12 11:33:11 (pid:2236) OwnerCheck(SYSTEM) failed in SetAttribute
for job 158.0
06/14/12 11:33:11 (pid:2236) ERROR "Failed to store MATCH_OpSys into
job ad 158.0" at line 3550 in file
c:\condor\execute\dir_5416\userdir\src\condor_schedd.v6\qmgmt.cpp
06/14/12 11:33:11 (pid:2236) Cron: Killing all jobs
06/14/12 11:33:11 (pid:2236) CronJobList: Deleting all jobs
06/14/12 11:33:11 (pid:2236) Cron: Killing all jobs
06/14/12 11:33:11 (pid:2236) CronJobList: Deleting all jobs

My setup is as follows:
All users perform remote submission to a central schedd in the pool
(schedd, negotiator, and  collector are all running on the same
machine). The whole pool is running on Windows Server 2003 R2.

The settings in my condor_config file on the machine are actually the
defaults for preen (I think):


##--------------------------------------------------------------------
##  Cleanup settings
##--------------------------------------------------------------------
##
##  Where should the master find the  binary? If you don't
##  want preen to run at all, set it to nothing.
##
##  Note: preen finds files and folders which are leftovers from crashed
##  jobs and removes them.
##
PREEN = $(SBIN)/condor_preen.exe

##  Who should condor_preen send email to?
PREEN_ADMIN = $(CONDOR_ADMIN)

##  How do you want preen to behave?  The "-m" means you want email
##  about files preen finds that it thinks it should remove.  The "-r"
##  means you want preen to actually remove these files.  If you don't
##  want either of those things to happen, just remove the appropriate
##  one from this setting.
PREEN_ARGS			= -m -r

##
##  How often should the master start up condor_preen (in seconds)?
##
PREEN_INTERVAL			= 86400

##  What files should condor_preen leave in the spool directory?
VALID_SPOOL_FILES	= job_queue.log, job_queue.log.tmp, history, \
                           Accountant.log, Accountantnew.log, \
                           local_univ_execute, .quillwritepassword, \
						  .pgpass, \
			  .schedd_address, .schedd_classad


For now I've switched off the condor_preen on this machine but having
actually a solution for the problem would be nicer. ;-)
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/