Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor and Blast on multi-processor compute nodes

Date: Fri, 28 Apr 2006 12:59:31 -0500
From: Dan Bradley <dan@xxxxxxxxxxxx>
Subject: Re: [Condor-users] Condor and Blast on multi-processor compute nodes


Miskell, Craig wrote:

Partial Solutions to date:
1) I initially took a hint from the Bologna Batch System short/long
running jobs config (thanks to whoever documented that) and implemented
an "ExclusiveVM vs Non-Exclusive VM" system.  Each node had an extra VM
configured which was an Exclusive VM (I also lied about the amount of
memory and number of CPUS in the config file so as to get appropriate
final numbers).  Via the START expression, when a job was marked
"exclusive" it could only run on the Exclusive VM, otherwise it could
only run on one of the non-exclusive VMs.  If an exclusive job was
running on a node, no non-exclusive jobs could start there either, and
vice-versa, if any non-exclusive jobs were running, no exclusive jobs
could start.  Blast jobs were submitted as exclusive jobs, and ran with
"-a X", thus using all available CPU.  Condor only thought one VM was in

use, but all CPU was being efficiently used.

But this breaks down in the presence of more than a few non-exclusive
jobs.  Unless all non-exclusive VMs on a node are vacated at the same
time, an exclusive job never gets a chance to run - in practice, with
lots of jobs in a queue, non-exclusive jobs on the same node almost
never finish at the same time, so the exclusive jobs are locked out, no
matter what user priority there might be (the START expression simply
never matches)

So, I pondered allowing an exclusive job to start up if there was at
least one free non-exclusive VM, and never allow non-exclusive to start
if there is an exclusive job running.  This would oversubscribe the CPU
for a while until any non-exclusive jobs finished, and when the
exclusive job finishes, non-exclusive could still potentially be allowed
back on if Rank/priority allowed.  But this seems inefficient to me
(increases wall-clock time at the least).  Am I worried over nothing?

Yes, to avoid the starvation problem you mentioned agove, you eitherneed to temporarily oversubscribe, or preempt the non-exclusive jobswhen an exclusive one starts. At least, those are the only options I canthink of.

2) Second attempt was to try an implement my option b) above.  I got rid
of the whole exclusive/non-exclusive vm idea.  Each blast job runs with
-a 1, and advertises an extra attribute:
BlastDatabaseUsed
which is the name of the blast database in use by that job.  Then on the
compute node I added:
STARTD_JOB_EXPRS = BlastDatabaseUsed
STARTD_VM_EXPRS = BlastDatabaseUsed
to the local config file.  So STARTD_JOB_EXPRS pushed BlastDatabaseUsed
for the current job into the VM classad for the VM it was running on,
and STARTD_VM_EXPRS made it available to all the other VMs.  With the
Start expression:
START = $(START) && \
       (TARGET.BlastDatabaseUsed=?=UNDEFINED || \
               ((vm1_BlastDatabaseUsed=?=UNDEFINED ||
vm1_BlastDatabaseUsed=?=TARGET.BlastDatabaseUsed) && \
               (vm2_BlastDatabaseUsed=?=UNDEFINED ||
vm2_BlastDatabaseUsed=?=TARGET.BlastDatabaseUsed)))
I would have expected it to work.  But, there seems to be some sort of
delay between STARTD_JOB_EXPRS pushing into the VM classad and
STARTD_VM_EXPRS pushing that attribute into the other VMs classAds,
resulting in the start expression matching for a new database while
there was still a job running for the old one.  Nodes flip-flopped and
had two jobs running at the same time for different databases.  It
varied with each run, as would be expected from some kind of race
condition.

I can't think of any way to remove this race condition, because if themachines happen to get matched within the same negotiation cycle, thenegotiator won't know about the changes to the startd ads until the nextround anyway.

Another idea would be to have the job record somewhere on the machinewhat DB it is using and have a startd cron job that publishes thisinformation into the machine ad as the "last DB used". You probablywouldn't want to use that as a hard requirement, as in your aboveexample, but you could submit your jobs with a rank expression thatprefers to run on a machine where the last DB used matches the DB to beused by the job. Since this "last DB used" would persist beyond the lifeof the jobs, it might be less vulnerable to the race condition, becauseit might change less often. However, the result would depend on howoften jobs manage to land on a machine matching the DB they want, so itis, unfortunately, rather unpredictable.


--Dan

References:
- [Condor-users] Condor and Blast on multi-processor compute nodes
  - From: Miskell, Craig

Prev by Date: Re: [Condor-users] server hardware for a big pool
Next by Date: [Condor-users] How to get Condor jobs to respect gid and umask
Previous by thread: [Condor-users] Condor and Blast on multi-processor compute nodes
Next by thread: [Condor-users] Configuring java
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Condor and Blast on multi-processor compute nodes