[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] 6.8.6 -> 7.0.5 on Windows taking a long time to vacate jobs



Have you tried setting USE_PROCD to False to see if that improves the situation?

Greg

Ian Chesal wrote:
Bump. Anyone else see the slow down in job kill time when condor_procd
is used in 7.0.5?

-----Original Message-----
From: Ian Chesal
Sent: Friday, November 21, 2008 4:58 PM
To: Ian Chesal; 'Condor-Users Mail List'
Subject: RE: 6.8.6 -> 7.0.5 on Windows taking a long time to
vacate jobs

I'm in the process of moving Windows machines from 6.8.6 to
7.0.5 and I was noting issues with thrashing and my startd
RANK policy
on machines running 7.0.5. It appears that 7.0.5 takes a very long
time to preempt running jobs when a higher startd RANK job comes
along. I can switch in 6.8.6 for 7.0.5 and the same jobs
take only a
minute to preempt but when I move to 7.0.5 the jobs take
10 minutes
to preempt.

In the time it takes to preempt jobs on 7.0.5-based machines the
waiting jobs give up their claim.

I tried increasing REQUEST_CLAIM_TIMEOUT from 900 to 1200
seconds but
it didn't make a difference. It's not diserable for my preemption
policy to push that number too much higher.

Has something changed from 6.8.6 to 7.0.5 in the way Condor
is killing
jobs when they're preempted? I'm wondering why this
operation takes so
much longer in 7.0.5 than it did in 6.8.6. These are plain vanilla
universe jobs. So no checkpointing.

Actually, if I change REQUEST_CLAIM_TIMEOUT and do a
'condor_reconfig
-full -all' does it apply to newly spawned shadows or do I have to
restart Condor completely on my schedulers for this to take effect?
Digging around a bit it could be related to:

http://www.cs.wisc.edu/condor/manual/v7.0/3_6Security.html#sec
:RunAsNobody

I have:

SLOT1_USER=ALTERA\cndrusr1
SLOT2_USER=ALTERA\cndrusr2
SLOT3_USER=ALTERA\cndrusr3
SLOT4_USER=ALTERA\cndrusr4
SLOT5_USER=ALTERA\cndrusr5
SLOT6_USER=ALTERA\cndrusr6
SLOT7_USER=ALTERA\cndrusr7
SLOT8_USER=ALTERA\cndrusr8

But I set:

DEDICATED_EXECUTE_ACCOUNT_REGEXP = cndrusr[0-9]+

Should I have included the domain in that regexp?

DEDICATED_EXECUTE_ACCOUNT_REGEXP = ALTERA\\cndrusr[0-9]+

My machines are saying USE_PROCD is undefined (and it is in
my configs) but it's starting up so does that mean
condor_startd is using condor_procd to track and kill
processes on my Windows machines? Could this be my problem?
That procd is doing this work?

- Ian


Confidentiality Notice.
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/