[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] killed jobs hang around in idle state



Hi

I'm having problems trying to kill jobs at a certain
time when using Condor 6.6.5 on Win2K. When the job
is killed it continues to hang around in the idle
state indefinitely:

C:\Condor\ics>condor_q -analyze
-- Submitter: 102153-71130c.liv.ac.uk : <138.253.102.153:1042> : 102153-71130c.l
iv.ac.uk
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
---
187.000: Run analysis summary. Of 2 machines,
1 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match, but are serving users with a better priority in the pool
1 match, but prefer another specific job despite its worse user-priority
0 match, but will not currently preempt their existing job
0 are available to run your job
Last successful match: Tue Jun 22 13:05:31 2004


1 jobs; 1 idle, 0 running, 0 held

The config file looks like:

WANT_SUSPEND = FALSE
WANT_VACATE = TRUE
START = TRUE
SUSPEND = ClockMin > 660
CONTINUE	=	FALSE
PREEMPT = TRUE
KILL = TRUE

Something seems to be wrong judging by SchedLog:

6/22 13:05:57 DaemonCore: Command received via TCP from host <138.253.102.153:1365>
6/22 13:05:57 DaemonCore: received command 443 (VACATE_SERVICE), calling handler (vacate_service)
6/22 13:05:57 Got VACATE_SERVICE from <138.253.102.153:1365>
6/22 13:05:57 Sent RELEASE_CLAIM to startd on <138.253.102.153:1041>
6/22 13:05:57 Match record (<138.253.102.153:1041>, 187, 0) deleted
6/22 13:05:57 DaemonCore: Command received via UDP from host <138.253.102.153:1367>
6/22 13:05:57 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand())
6/22 13:05:57 Scheduler::Relinquish - mrec is NULL, can't relinquish
6/22 13:05:57 Null parameter --- match not deleted
6/22 13:06:04 DaemonCore: Command received via UDP from host <138.253.102.153:1371>


any ideas ?

thanks in advance

-ian.