[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] vm1 info in vm2 ClassAd?



Ian, out of interest how does this handle the machine being entirely empty and multiple jobs being able to run?
Do they all start, then the ones that aren't blessed for the special mode kicked shortly after? 

The "negotiation lacks awareness of changes brought about by previous assignments in the cycle for SMP machines" is really quite annoying.

A simple switch to allow only one assignment/alteration of a slot on an SMP machine per negotiation cycle (and require a refresh of the condor_collector state of all slots on that machine until it is okay to assign to it again) would be nice as you could then write idealized configuration and look to have this made more effective and efficient later.

Obviously anything depending on something like hawkeye would always require this but simple ones that just reference the state of other slots in terms of what jobs are assigned to them would work fine in a 'self aware negotiation model.

Matt

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: 31 August 2009 20:22
To: Condor-Users Mail List
Subject: Re: [Condor-users] vm1 info in vm2 ClassAd?


> I am working on a cluster made up of many SMP machines.
> For a given node which contains multiple virtual machines,
> is it possible to include in the status ("claimed" or "unclaimed")
> of  vm1 in the ClassAd of vm2, and vice versa?
>
> Basically I want to be able to decide if my job gets submitted to
> vmX based on whether or not vmY is claimed or unclaimed.
>
>        # condor_version
>       $CondorVersion: 6.8.4 Feb  1 2007 $
>       $CondorPlatform: X86_64-LINUX_RHEL3 $

Yes. See:
http://www.cs.wisc.edu/condor/manual/v7.2/3_3Configuration.html#17619

And for an example of how to use this:

# Advertise slot 1 as an "isolated" slot, meaning that when a
# "sensitive" job runs on it, the other slots of the machine will
# be emptied to provide an isolated execution environment for
# the timing-sensitive job.
vm1_AlteraIsIsolatedJobSlot = TRUE
STARTD_EXPRS = $(STARTD_EXPRS), AlteraIsIsolatedJobSlot

# If slot 1 is running a sensitive job, it should advertise a
# special classad. This classad will be referred to in the config below.
STARTD_JOB_EXPRS = $(STARTD_JOB_EXPRS), AlteraJobAttributeIsSensitive
STARTD_VM_EXPRS = $(STARTD_VM_EXPRS), AlteraJobAttributeIsSensitive

# To reduce inefficiency due to preemption, we always prefer sensitive
# jobs so that they don't get preempted after killing all non-sensitive
# jobs running on this machine.
# Note that sensitive jobs still compete for resources using the normal
# group/priority/fifo scheme among themselves.
ALTERA_GODLIKE_RANK = 1000
RANK = ( \
          (((TARGET.AlteraJobAttributeIsSensitive =?= TRUE) ||
("godlike" =?= TARGET.AlteraGroup)) * $(ALTERA_GODLIKE_RANK)) + \
          ((("$(AlteraPreferredGroup1)" != "") &&
("$(AlteraPreferredGroup1)" =?= TARGET.AlteraGroup)) *
$(AlteraPreferredGroupRank1)) + \
          ((("$(AlteraPreferredGroup2)" != "") &&
("$(AlteraPreferredGroup2)" =?= TARGET.AlteraGroup)) *
$(AlteraPreferredGroupRank2)) + \
          ((("$(AlteraPreferredGroup3)" != "") &&
("$(AlteraPreferredGroup3)" =?= TARGET.AlteraGroup)) *
$(AlteraPreferredGroupRank3)) + \
          ((("$(AlteraPreferredGroup4)" != "") &&
("$(AlteraPreferredGroup4)" =?= TARGET.AlteraGroup)) *
$(AlteraPreferredGroupRank4)) + \
          ((("$(AlteraPreferredGroup5)" != "") &&
("$(AlteraPreferredGroup5)" =?= TARGET.AlteraGroup)) *
$(AlteraPreferredGroupRank5)) + \
          ((("$(AlteraPreferredGroup6)" != "") &&
("$(AlteraPreferredGroup6)" =?= TARGET.AlteraGroup)) *
$(AlteraPreferredGroupRank6)) \
       )

# Make sure that slot 1 can execute any jobs, and other slots can only
execute
# a job if slot one is unclaimed or is not running a sensitive job
START = \
   ( \
      ($(START)) && \
      ( \
         (VirtualMachineID == 1) || \
         ( \
            VirtualMachineID != 1 && \
            ( \
               vm1_State =?= "Unclaimed" || \
               vm1_AlteraJobAttributeIsSensitive =?= UNDEFINED || \
               vm1_AlteraJobAttributeIsSensitive =?= FALSE \
            ) \
         ) \
      ) \
   )

# If a sensitive job is running in slot 1, jobs in other slots are
preempted regardless
ALTERA_NON_SENSITIVE_SLOT_PREEMPTED_BY_SENSITIVE_JOB = \
   ( \
      VirtualMachineID != 1 && \
      vm1_State =!= "Unclaimed" && \
      vm1_AlteraJobAttributeIsSensitive =!= UNDEFINED && \
      vm1_AlteraJobAttributeIsSensitive =?= TRUE \
   )
PREEMPT = (($(PREEMPT)) ||
($(ALTERA_NON_SENSITIVE_SLOT_PREEMPTED_BY_SENSITIVE_JOB)))
MaxJobRetirementTime = (($(MaxJobRetirementTime)) *
($(ALTERA_NON_SENSITIVE_SLOT_PREEMPTED_BY_SENSITIVE_JOB) =?= FALSE))


This config sets slot 1 to be a "sensitive" slot so any job that wants
to take over an entire machines targets slot 1 and preempts any other
jobs in the other slots when it starts to run. It also ensures that no
other jobs run in the other slots as long as there's a "sensitive" job
running in slot 1.

Presumably the need for crazy stuff like this should be reduced to zero
as dynamic partitioning of SMP machines ramps up. It's in 7.2.x but I
haven't had a chance to check it out yet.

- Ian

Confidentiality Notice.
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/

----
Gloucester Research Limited believes the information provided herein is reliable. While every care has been taken to ensure accuracy, the information is furnished to the recipients with no warranty as to the completeness and accuracy of its contents and on condition that any errors or omissions shall not be made the basis for any claim, demand or cause for action.
The information in this email is intended only for the named recipient.  If you are not the intended recipient please notify us immediately and do not copy, distribute or take action based on this e-mail.
All messages sent to and from this email address will be logged by Gloucester Research Ltd and are subject to archival storage, monitoring, review and disclosure.
Gloucester Research Limited, 5th Floor, Whittington House, 19-30 Alfred Place, London WC1E 7EA.
Gloucester Research Limited is a company registered in England and Wales with company number 04267560.
----