[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] 6.8.6 -> 7.0.5 on Windows taking a long time to vacate jobs



Bump. Anyone else see the slow down in job kill time when condor_procd
is used in 7.0.5?

> -----Original Message-----
> From: Ian Chesal
> Sent: Friday, November 21, 2008 4:58 PM
> To: Ian Chesal; 'Condor-Users Mail List'
> Subject: RE: 6.8.6 -> 7.0.5 on Windows taking a long time to
> vacate jobs
>
> > I'm in the process of moving Windows machines from 6.8.6 to
> > 7.0.5 and I was noting issues with thrashing and my startd
> RANK policy
> > on machines running 7.0.5. It appears that 7.0.5 takes a very long
> > time to preempt running jobs when a higher startd RANK job comes
> > along. I can switch in 6.8.6 for 7.0.5 and the same jobs
> take only a
> > minute to preempt but when I move to 7.0.5 the jobs take
> >10 minutes
> > to preempt.
> >
> > In the time it takes to preempt jobs on 7.0.5-based machines the
> > waiting jobs give up their claim.
> >
> > I tried increasing REQUEST_CLAIM_TIMEOUT from 900 to 1200
> seconds but
> > it didn't make a difference. It's not diserable for my preemption
> > policy to push that number too much higher.
> >
> > Has something changed from 6.8.6 to 7.0.5 in the way Condor
> is killing
> > jobs when they're preempted? I'm wondering why this
> operation takes so
> > much longer in 7.0.5 than it did in 6.8.6. These are plain vanilla
> > universe jobs. So no checkpointing.
> >
> > Actually, if I change REQUEST_CLAIM_TIMEOUT and do a
> 'condor_reconfig
> > -full -all' does it apply to newly spawned shadows or do I have to
> > restart Condor completely on my schedulers for this to take effect?
>
> Digging around a bit it could be related to:
>
> http://www.cs.wisc.edu/condor/manual/v7.0/3_6Security.html#sec
> :RunAsNobody
>
> I have:
>
> SLOT1_USER=ALTERA\cndrusr1
> SLOT2_USER=ALTERA\cndrusr2
> SLOT3_USER=ALTERA\cndrusr3
> SLOT4_USER=ALTERA\cndrusr4
> SLOT5_USER=ALTERA\cndrusr5
> SLOT6_USER=ALTERA\cndrusr6
> SLOT7_USER=ALTERA\cndrusr7
> SLOT8_USER=ALTERA\cndrusr8
>
> But I set:
>
> DEDICATED_EXECUTE_ACCOUNT_REGEXP = cndrusr[0-9]+
>
> Should I have included the domain in that regexp?
>
> DEDICATED_EXECUTE_ACCOUNT_REGEXP = ALTERA\\cndrusr[0-9]+
>
> My machines are saying USE_PROCD is undefined (and it is in
> my configs) but it's starting up so does that mean
> condor_startd is using condor_procd to track and kill
> processes on my Windows machines? Could this be my problem?
> That procd is doing this work?
>
> - Ian
>

Confidentiality Notice.
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.