[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Anti Affinity



I think you've hit a bug. The slot name should be transformed into a valid attribute name - iirc, a . cannot be in an attribute name.

Best,


matt

Sateesh Potturu wrote:
Hi Matt,

I was able to achieve anti affinity with the approach that I mentioned
earlier; but not along with partitionable slots.

When I use STARTD_SLOT_ATTRS along with partitionable slots, I see the
classads like below

slot1.3_Cmd = "/home/sateesh/tmp/sh_loop2"
slot1.2_Cmd = "/home/sateesh/tmp/sh_loop2"
slot1.1_Cmd = "/home/sateesh/tmp/sh_loop2"

Should it not be slot1_3_Cmd, slot1_2_Cmd, slot1_1_Cmd? ( _ instead of . )

I have STARTD_JOB_EXPRS include Cmd and STARTD_SLOT_ATTRS also include
Cmd. My job requirements contains (TARGET.slot1_Cmd =!=
"/home/sateesh/tmp/sh_loop2") (repeated for each slot). With this, I
was able to achieve that anti affinity I was asking about. Mailing
list archives had configurations for uniform distribution based on
RANK. But, I wanted condor to not start an executable more than once
on any execute node. So, I used Cmd.

I tested this anti affinity without partitionable slot and it works
good; as I expected. With partitionable slots, I suspect the check
against TARGET.slot1.1_Cmd fails because "." is a seperator.

I tested my observation with condor_status -constraint and it matches.

I think anti affinity requires both requirements and rank --
Requirements to prevent the two instances starting on a same physical
machine and Rank to have breadth-first job distribution.

Thanks,
Sateesh

On Sat, Jan 24, 2009 at 2:50 AM, Matthew Farrellee <matt@xxxxxxxxxx> wrote:
Sateesh Potturu wrote:
Hello,

How can I get anti affinity behavior for jobs?

If I have two jobs (A and B) and two machines with two CPUs each, how
can I control the jobs such that both job A and job B don't run on the
same execute machine.

Can I control this using STARTD_JOB_EXPRS? I tried adding Cmd to this
config variable without success. But, startd reports

"Job wants DaemonCore starter, skipping
/opt/condor-7.2.0/sbin/condor_starter.std"
"slot1.1: Job Requirements check failed!"
You might check the archives for discussions about uniformly
distributing jobs and/or tightly packing them.

There were some good example configurations.

Best,


matt
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/