[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] job submitting to powerful machines.



> I tried to apply your suggestions and I come up with the requirement
> expression (REQUIREMENTS = (OpSys == "WINNT52" && SlotID == 3 ||
SlotID
> == 4))  and it will run jobs on all windows 2003 machines on
processors
> 3 and 4, if I use slotId 1 and 2 it will run it on xp and 2003 boxes
on
> slotid1 and slotid2. How can I direct my jobs for example to machine6
to
> slots 1 and 3, Only. Or to all CPU's on machine6.

So rather than give you the requirements expressions why don't I show
you how I'm getting the information to match against? There's nothing
magic about what I'm doing -- you just have to know where the
information is stored in Condor.

If you run:

   condor_status -long <hostname>

You'll get the full class ad for the machine that lists all the slots
and all the attributes for each slot. You can use any of these
attributes when you're writing your requirements statement for your
cluster.

For example, here's condor_status -long for a Windows XP SP3 machine in
my pool:

D:\computefarm\main\abc\src\bin>condor_status -long ttc-ichesal
MyType = "Machine"
TargetType = "Job"
Name = "TTC-ICHESAL.altera.com"
Machine = "TTC-ICHESAL.altera.com"
Rank = ((("godlike" =?= TARGET.AlteraGroup) * 100000000) + ((("software"
!= "") && ("software" =?= TARGET.AlteraGroup)) * 60) + ((("" != "") &&
("" =?= TARGET.AlteraGroup)) * 50) + ((("" != "") && ("" =?=
TARGET.AlteraGroup)) * 40) + ((("" != "") && ("" =?=
TARGET.AlteraGroup)) * 30) + ((("" != "") && ("" =?=
TARGET.AlteraGroup)) * 20) + ((("" != "") && ("" =?=
TARGET.AlteraGroup)) * 10))
CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000)
COLLECTOR_HOST_STRING = "ttc-negotiator.altera.com"
STARTER_JOB_ENVIRONMENT = "  LM_LICENSE_FILE=1800@ttc-licsrv;1700@goto
PERL5LIB=d:\abc\perl_site_lib\site\lib;d:\abc\perl\lib;d:\abc\perl\site\
lib;.
PATH=d:\abc\perl\bin;d:\abc\condor\bin;C:\WINDOWS;C:\WINDOWS\System32;C:
\WINDOWS\System32\Wbem;c:\bin;c:\mks\mksnt; "
AlteraMachineClass = 2200
AlteraIsDesktop = TRUE
AlteraExecuteEnabled = FALSE
AlteraBuildReuseEnabled = FALSE
AlteraPreferredGroup1 = software
AlteraQuotedPreferredGroup1 = "software"
AlteraQuotedPreferredGroup2 = ""
AlteraQuotedPreferredGroup3 = ""
AlteraQuotedPreferredGroup4 = ""
AlteraQuotedPreferredGroup5 = ""
AlteraQuotedPreferredGroup6 = ""
AlteraPreferredGroupRank1 = 60
AlteraPreferredGroupRank2 = 50
AlteraPreferredGroupRank3 = 40
AlteraPreferredGroupRank4 = 30
AlteraPreferredGroupRank5 = 20
AlteraPreferredGroupRank6 = 10
AlteraOperatingSystem = "winxp"
AlteraArchitecture = "intel32"
ACEGroup1 = ""
ACEGroup2 = ""
ACEGroup3 = ""
ACEGroup4 = ""
ACEGroup5 = ""
condor = TRUE
perl = TRUE
perl_site_lib = TRUE
acewin = TRUE
abctools = TRUE
AlteraBatchClientVersion = 1.000000
CondorVersion = "$CondorVersion: 6.8.6 Sep 13 2007 $"
CondorPlatform = "$CondorPlatform: INTEL-WINNT50 $"
VirtualMachineID = 1
VirtualMemory = 3364684
Disk = 39241304
CondorLoadAvg = 0.000000
LoadAvg = 0.860000
KeyboardIdle = 0
ConsoleIdle = 0
Memory = 2037
Cpus = 1
StartdIpAddr = "<137.57.182.107:1083>"
Arch = "INTEL"
OpSys = "WINNT51"
UidDomain = "altera.com"
FileSystemDomain = "altera.com"
Subnet = "137.57.182"
HasIOProxy = TRUE
CheckpointPlatform = "WINNT51 INTEL Unknown normal"
TotalVirtualMemory = 3364684
TotalDisk = 39241304
TotalCpus = 1
TotalMemory = 2037
KFlops = 1202568
Mips = 4037
LastBenchmark = 1232389717
TotalLoadAvg = 0.860000
TotalCondorLoadAvg = 0.000000
ClockMin = 1065
ClockDay = 3
TotalVirtualMachines = 1
HasFileTransfer = TRUE
HasPerFileEncryption = TRUE
HasReconnect = TRUE
HasMPI = TRUE
HasTDP = TRUE
HasJobDeferral = TRUE
HasJICLocalConfig = TRUE
HasJICLocalStdin = TRUE
HasWindowsRunAsOwner = TRUE
StarterAbilityList =
"HasFileTransfer,HasPerFileEncryption,HasReconnect,HasMPI,HasTDP,HasJobD
eferral,HasJICLocalConfig,HasJICLocalStdin,HasWindowsRunAsOwner"
CpuBusyTime = 664
CpuIsBusy = TRUE
TimeToLive = 2147483647
State = "Owner"
EnteredCurrentState = 1232389709
Activity = "Idle"
EnteredCurrentActivity = 1232389709
Start = ((FALSE =?= TRUE) && (((TRUE =?= TRUE) && ((((FALSE =!= TRUE) ||
(((TRUE =?= TRUE) && ((((((ClockDay > 0 && ClockDay < 6) && ((474 > 1212
&& (ClockMin >= 1212 && ClockMin <= 474)) || (474 < 1212 && (ClockMin >=
1212 || ClockMin <= 474)))) || ((ClockDay == 0 || ClockDay == 6) &&
((1439 > 0 && (ClockMin >= 0 && ClockMin <= 1439)) || ((1439 < 0 &&
(ClockMin >= 0 || ClockMin <= 1439)))))) && (((LoadAvg - CondorLoadAvg)
<= 0.300000) || (State != "Unclaimed" && State != "Owner")) &&
(KeyboardIdle>= 180)) || ((VirtualMachineID <= 0))) =!= TRUE)) || ((TRUE
=!= TRUE) && FALSE))) =?
= FALSE) && (MaxJobRetirementTime =?= UNDEFINED || MaxJobRetirementTime
=?= 0))) ||((TRUE =!= TRUE) && ((((FALSE =!= TRUE) || (((TRUE =?= TRUE)
&& ((((((ClockDay > 0 && ClockDay < 6) && ((474 > 1212 && (ClockMin >=
1212 && ClockMin <= 474)) || (474 <1212 && (ClockMin >= 1212 || ClockMin
<= 474)))) || ((ClockDay == 0 || ClockDay == 6) && ((1439 > 0 &&
(ClockMin >= 0 && ClockMin <= 1439)) || ((1439 < 0 && (ClockMin >= 0 ||
ClockMin <= 1439)))))) && (((LoadAvg - CondorLoadAvg) <= 0.300000) ||
(State!= "Unclaimed" && State != "Owner")) && (KeyboardIdle >= 180)) ||
((VirtualMachineID <= 0))) =!= TRUE)) || ((TRUE =!= TRUE) && FALSE)))
=?= FALSE) && (MaxJobRetirementT
ime =?= UNDEFINED || MaxJobRetirementTime =?= 0)))))
Requirements = (START) && (IsValidCheckpointPlatform)
IsValidCheckpointPlatform = (((TARGET.JobUniverse == 1) == FALSE) ||
((MY.CheckpointPlatform =!= UNDEFINED) &&
((TARGET.LastCheckpointPlatform =?= MY.CheckpointPlatform) ||
(TARGET.NumCkpts == 0))))
MaxJobRetirementTime = ((TRUE =?= TRUE) * (2147483640 *
((((VirtualMachineID <= 0) == FALSE) && ((FALSE && (((LoadAvg -
CondorLoadAvg) >= 0.500000) || (KeyboardIdle < 60))) || ((((ClockDay > 0
&& ClockDay < 6) && ((474 > 1212 && (ClockMin >= 1212 && ClockMin <=
474)) || (474 < 1212 && (ClockMin >= 1212 || ClockMin <= 474)))) ||
((ClockDay == 0 || ClockDay == 6) && ((1439 > 0 && (ClockMin >= 0 &&
ClockMin <= 1439)) || ((1439 < 0 && (ClockMin >= 0 || ClockMin <=
1439)))))) == FALSE))) == FALSE) * ((Activity != "Idle") &&
(((CurrentTime - JobStart) > 300) || (MY.AlteraJobAttributeIsInteractive
=?= TRUE))))) + ((TRUE =!= TRUE) * (2147483640 * ((Activity != "Idle")
&&
(((CurrentTime - JobStart) > 300) || (MY.AlteraJobAttributeIsInteractive
=?= TRUE)))))
CurrentRank = 0.000000
AvailTime = 0.000000
LastAvailInterval = 0
MonitorSelfTime = 1232577882
MonitorSelfCPUUsage = 0.052055
MonitorSelfImageSize = 57560.000000
MonitorSelfResidentSetSize = 12200
MonitorSelfAge = 188179
MonitorSelfRegisteredSocketCount = 2
vm1_State = "Owner"
vm1_Activity = "Idle"
vm1_EnteredCurrentActivity = 1232389709
vm1_Memory = 2037
DaemonStartTime = 1232389709
UpdateSequenceNumber = 3135
MyAddress = "<137.57.182.107:1083>"
LastHeardFrom = 1232577946
UpdatesTotal = 3136
UpdatesSequenced = 3135
UpdatesLost = 12
UpdatesHistory = "0x00000000000000000000000000000000"

As you can see it's a 1 slot machine running Windows XP. And there are
*lots* of attributes you can match against. You just need to look at the
machines you want to match and figure out what the best set of
attributes are to use and the correct Boolean expression you need to
write to match the machines.

I'll let you mull that over. Reply to this message tomorrow with what
you think the requirements expression should be and I'll let you know if
you're hot or cold. Conveniently, you can always try it out on your
farm. You can use the -constraint option on condor_status to test an
expression. For example:

condor_status -const 'OpSys == "WINNT52" && (SlotID == 3 || SlotID ==
4)'

Would show you all machines that are running WinXP SP3 and have slots
numbered 3 or 4.

And once your jobs are queued you can use:

condor_q -better-analyze <cluster>.<proc>

And it'll show you which parts of your job's requirements expressions
are preventing you from matching machines.

Hope that helps take you to the next level with Condor!

- Ian

Confidentiality Notice.
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.