[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Flocking drawback



We had a similar problem trying to flock to the UW CS condor pool.  After a few test runs of a program that generates thousands of jobs every run, we had completely hosed our priority and it was taking days to get anything to start running.  Zachary Miller, one of the condor team members, was able to do something to get us up and running again.  I’m not sure what he did, but I know it’s possible to fix or at least work around this issue.

 

Michael.

 

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Thomas Materna
Sent: Monday, September 19, 2005 10:26 PM
To: condor-users@xxxxxxxxxxx
Subject: [Condor-users] Flocking drawback

 

Hi,

I have major problems with flocking. I have a pool A of 3 computers sharing a filesystem. I have a pool B of 20 computers not sharing the same file system as the pool A. A flocks to B. I have a bunch of jobs submitted from A in standard universe but I have a very bad priority since I've been doing that a lot lately. Another user also has whole bunch of jobs submitted to A. But his are in vanilla universe, he added in his submit file a requirement of the type

((Machine==A1) || (Machine==A2)...) where A1, A2 are the machines in the pool A.

 

Well, he will never run on pool B, but he prevents me from running on it!!!! What happens is that at every cycle, having a better priority, he claims all the machines in pool B, my jobs can hence not do so. Only then the jobs reject the machines for not meeting the requirement. I have 20 machines doing nothing!

 

How can I get around that? Is there a way to avoid the jobs claiming machine they won't accept to run on anyway? If not, I consider it a major flaw. One user like that could stop everything. And since he's not running, his priority won't go up. I'll have to wait 5 days to have my priority back to a competitive level. And even then, some machines won't be used.

 

Thanks for any help you guys could provide,

 

Thomas