[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] FW: job submitting to powerful machines.



Ian,

I guess my question for both cases would it be: what is the difference of using the parenthesis?

Thanks for your help in advance,

Alex

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Alas, Alex [FEDI]
Sent: Monday, February 02, 2009 12:31 PM
To: Condor-Users Mail List
Subject: [Condor-users] FW: job submitting to powerful machines.

 

Ian,

Thank you very much for your help,  I think I found the correct _expression_ using the condor_status –long command as you suggested. I am going to ask for your help one more time,

Sample of directing jobs to specific systems.

###   DESCRIPTION FILE - Fri Jan 30 16:33:32 EST 2009   ###

UNIVERSE = VANILLA

REQUIREMENTS = OpSys == "WINNT52" && machine == "machine1.domain.com" || machine == "machine2.domain.com"

NOTIFY_USER = user1@xxxxxxxxxx

GETENV = TRUE

INITIALDIR = \\sharename\sharesubdir1\sharesubdir2\sharesubdir3\sharesubdir4

SHOULD_TRANSFER_FILES = YES

WHEN_TO_TRANSFER_OUTPUT = ON_EXIT

TRANSFER_INPUT_FILES = filename1.exe, filename2

RUN_AS_OWNER = TRUE

 

But I still have 2 specific questions:

1) I have two different scenarios or cases:

Case A:

When I submit jobs using  the requirements between the parenthesis, my central manager kind of balance the jobs between all the machines, placing one job on only one CPU per machine.

###   CONDOR DESCRIPTION FILE - Fri Jan 30 16:33:32 EST 2009   ###

UNIVERSE = VANILLA

REQUIREMENTS = (Arch == "INTEL" && OpSys == "WINNT51") || (Arch == "INTEL" && OpSys == "WINNT52")

NOTIFY_USER = user1@xxxxxxxxxx

GETENV = TRUE

INITIALDIR = \\sharename\sharesubdir1\sharesubdir2\sharesubdir3\sharesubdir4

SHOULD_TRANSFER_FILES = YES

WHEN_TO_TRANSFER_OUTPUT = ON_EXIT

TRANSFER_INPUT_FILES = LasDHZg.exe, LASDHZSeed

RUN_AS_OWNER = TRUE

Case B:

If I remove the parenthesis I will place the jobs on all the available CPU that the central manager can find as possible matches.  Am I right or wrong?

###   CONDOR DESCRIPTION FILE - Fri Jan 30 16:33:32 EST 2009   ###

UNIVERSE = VANILLA

REQUIREMENTS = == "INTEL" && OpSys == "WINNT51") ||  Arch == "INTEL" && OpSys == "WINNT52"

NOTIFY_USER = user1@xxxxxxxxxx

GETENV = TRUE

INITIALDIR = \\sharename\sharesubdir1\sharesubdir2\sharesubdir3\sharesubdir4

SHOULD_TRANSFER_FILES = YES

WHEN_TO_TRANSFER_OUTPUT = ON_EXIT

TRANSFER_INPUT_FILES = LasDHZg.exe, LASDHZSeed

RUN_AS_OWNER = TRUE

 

2) What is the difference between user priority and job priority? Which one overrides the other?

 

Thank you very much for your help,

Alex

 

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: Wednesday, January 21, 2009 5:54 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] job submitting to powerful machines.

 

> I tried to apply your suggestions and I come up with the requirement

> _expression_ (REQUIREMENTS = (OpSys == "WINNT52" && SlotID == 3 ||

SlotID

> == 4))  and it will run jobs on all windows 2003 machines on

processors

> 3 and 4, if I use slotId 1 and 2 it will run it on xp and 2003 boxes

on

> slotid1 and slotid2. How can I direct my jobs for example to machine6

to

> slots 1 and 3, Only. Or to all CPU's on machine6.

 

So rather than give you the requirements expressions why don't I show

you how I'm getting the information to match against? There's nothing

magic about what I'm doing -- you just have to know where the

information is stored in Condor.

 

If you run:

 

   condor_status -long <hostname>

 

You'll get the full class ad for the machine that lists all the slots

and all the attributes for each slot. You can use any of these

attributes when you're writing your requirements statement for your

cluster.

 

For example, here's condor_status -long for a Windows XP SP3 machine in

my pool:

 

D:\computefarm\main\abc\src\bin>condor_status -long ttc-ichesal

MyType = "Machine"

TargetType = "Job"

Name = "TTC-ICHESAL.altera.com"

Machine = "TTC-ICHESAL.altera.com"

Rank = ((("godlike" =?= TARGET.AlteraGroup) * 100000000) + ((("software"

!= "") && ("software" =?= TARGET.AlteraGroup)) * 60) + ((("" != "") &&

("" =?= TARGET.AlteraGroup)) * 50) + ((("" != "") && ("" =?=

TARGET.AlteraGroup)) * 40) + ((("" != "") && ("" =?=

TARGET.AlteraGroup)) * 30) + ((("" != "") && ("" =?=

TARGET.AlteraGroup)) * 20) + ((("" != "") && ("" =?=

TARGET.AlteraGroup)) * 10))

CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000)

COLLECTOR_HOST_STRING = "ttc-negotiator.altera.com"

STARTER_JOB_ENVIRONMENT = "  LM_LICENSE_FILE=1800@ttc-licsrv;1700@goto

PERL5LIB=d:\abc\perl_site_lib\site\lib;d:\abc\perl\lib;d:\abc\perl\site\

lib;.

PATH=d:\abc\perl\bin;d:\abc\condor\bin;C:\WINDOWS;C:\WINDOWS\System32;C:

\WINDOWS\System32\Wbem;c:\bin;c:\mks\mksnt; "

AlteraMachineClass = 2200

AlteraIsDesktop = TRUE

AlteraExecuteEnabled = FALSE

AlteraBuildReuseEnabled = FALSE

AlteraPreferredGroup1 = software

AlteraQuotedPreferredGroup1 = "software"

AlteraQuotedPreferredGroup2 = ""

AlteraQuotedPreferredGroup3 = ""

AlteraQuotedPreferredGroup4 = ""

AlteraQuotedPreferredGroup5 = ""

AlteraQuotedPreferredGroup6 = ""

AlteraPreferredGroupRank1 = 60

AlteraPreferredGroupRank2 = 50

AlteraPreferredGroupRank3 = 40

AlteraPreferredGroupRank4 = 30

AlteraPreferredGroupRank5 = 20

AlteraPreferredGroupRank6 = 10

AlteraOperatingSystem = "winxp"

AlteraArchitecture = "intel32"

ACEGroup1 = ""

ACEGroup2 = ""

ACEGroup3 = ""

ACEGroup4 = ""

ACEGroup5 = ""

condor = TRUE

perl = TRUE

perl_site_lib = TRUE

acewin = TRUE

abctools = TRUE

AlteraBatchClientVersion = 1.000000

CondorVersion = "$CondorVersion: 6.8.6 Sep 13 2007 $"

CondorPlatform = "$CondorPlatform: INTEL-WINNT50 $"

VirtualMachineID = 1

VirtualMemory = 3364684

Disk = 39241304

CondorLoadAvg = 0.000000

LoadAvg = 0.860000

KeyboardIdle = 0

ConsoleIdle = 0

Memory = 2037

Cpus = 1

StartdIpAddr = "<137.57.182.107:1083>"

Arch = "INTEL"

OpSys = "WINNT51"

UidDomain = "altera.com"

FileSystemDomain = "altera.com"

Subnet = "137.57.182"

HasIOProxy = TRUE

CheckpointPlatform = "WINNT51 INTEL Unknown normal"

TotalVirtualMemory = 3364684

TotalDisk = 39241304

TotalCpus = 1

TotalMemory = 2037

KFlops = 1202568

Mips = 4037

LastBenchmark = 1232389717

TotalLoadAvg = 0.860000

TotalCondorLoadAvg = 0.000000

ClockMin = 1065

ClockDay = 3

TotalVirtualMachines = 1

HasFileTransfer = TRUE

HasPerFileEncryption = TRUE

HasReconnect = TRUE

HasMPI = TRUE

HasTDP = TRUE

HasJobDeferral = TRUE

HasJICLocalConfig = TRUE

HasJICLocalStdin = TRUE

HasWindowsRunAsOwner = TRUE

StarterAbilityList =

"HasFileTransfer,HasPerFileEncryption,HasReconnect,HasMPI,HasTDP,HasJobD

eferral,HasJICLocalConfig,HasJICLocalStdin,HasWindowsRunAsOwner"

CpuBusyTime = 664

CpuIsBusy = TRUE

TimeToLive = 2147483647

State = "Owner"

EnteredCurrentState = 1232389709

Activity = "Idle"

EnteredCurrentActivity = 1232389709

Start = ((FALSE =?= TRUE) && (((TRUE =?= TRUE) && ((((FALSE =!= TRUE) ||

(((TRUE =?= TRUE) && ((((((ClockDay > 0 && ClockDay < 6) && ((474 > 1212

&& (ClockMin >= 1212 && ClockMin <= 474)) || (474 < 1212 && (ClockMin >=

1212 || ClockMin <= 474)))) || ((ClockDay == 0 || ClockDay == 6) &&

((1439 > 0 && (ClockMin >= 0 && ClockMin <= 1439)) || ((1439 < 0 &&

(ClockMin >= 0 || ClockMin <= 1439)))))) && (((LoadAvg - CondorLoadAvg)

<= 0.300000) || (State != "Unclaimed" && State != "Owner")) &&

(KeyboardIdle>= 180)) || ((VirtualMachineID <= 0))) =!= TRUE)) || ((TRUE

=!= TRUE) && FALSE))) =?

= FALSE) && (MaxJobRetirementTime =?= UNDEFINED || MaxJobRetirementTime

=?= 0))) ||((TRUE =!= TRUE) && ((((FALSE =!= TRUE) || (((TRUE =?= TRUE)

&& ((((((ClockDay > 0 && ClockDay < 6) && ((474 > 1212 && (ClockMin >=

1212 && ClockMin <= 474)) || (474 <1212 && (ClockMin >= 1212 || ClockMin

<= 474)))) || ((ClockDay == 0 || ClockDay == 6) && ((1439 > 0 &&

(ClockMin >= 0 && ClockMin <= 1439)) || ((1439 < 0 && (ClockMin >= 0 ||

ClockMin <= 1439)))))) && (((LoadAvg - CondorLoadAvg) <= 0.300000) ||

(State!= "Unclaimed" && State != "Owner")) && (KeyboardIdle >= 180)) ||

((VirtualMachineID <= 0))) =!= TRUE)) || ((TRUE =!= TRUE) && FALSE)))

=?= FALSE) && (MaxJobRetirementT

ime =?= UNDEFINED || MaxJobRetirementTime =?= 0)))))

Requirements = (START) && (IsValidCheckpointPlatform)

IsValidCheckpointPlatform = (((TARGET.JobUniverse == 1) == FALSE) ||

((MY.CheckpointPlatform =!= UNDEFINED) &&

((TARGET.LastCheckpointPlatform =?= MY.CheckpointPlatform) ||

(TARGET.NumCkpts == 0))))

MaxJobRetirementTime = ((TRUE =?= TRUE) * (2147483640 *

((((VirtualMachineID <= 0) == FALSE) && ((FALSE && (((LoadAvg -

CondorLoadAvg) >= 0.500000) || (KeyboardIdle < 60))) || ((((ClockDay > 0

&& ClockDay < 6) && ((474 > 1212 && (ClockMin >= 1212 && ClockMin <=

474)) || (474 < 1212 && (ClockMin >= 1212 || ClockMin <= 474)))) ||

((ClockDay == 0 || ClockDay == 6) && ((1439 > 0 && (ClockMin >= 0 &&

ClockMin <= 1439)) || ((1439 < 0 && (ClockMin >= 0 || ClockMin <=

1439)))))) == FALSE))) == FALSE) * ((Activity != "Idle") &&

(((CurrentTime - JobStart) > 300) || (MY.AlteraJobAttributeIsInteractive

=?= TRUE))))) + ((TRUE =!= TRUE) * (2147483640 * ((Activity != "Idle")

&&

(((CurrentTime - JobStart) > 300) || (MY.AlteraJobAttributeIsInteractive

=?= TRUE)))))

CurrentRank = 0.000000

AvailTime = 0.000000

LastAvailInterval = 0

MonitorSelfTime = 1232577882

MonitorSelfCPUUsage = 0.052055

MonitorSelfImageSize = 57560.000000

MonitorSelfResidentSetSize = 12200

MonitorSelfAge = 188179

MonitorSelfRegisteredSocketCount = 2

vm1_State = "Owner"

vm1_Activity = "Idle"

vm1_EnteredCurrentActivity = 1232389709

vm1_Memory = 2037

DaemonStartTime = 1232389709

UpdateSequenceNumber = 3135

MyAddress = "<137.57.182.107:1083>"

LastHeardFrom = 1232577946

UpdatesTotal = 3136

UpdatesSequenced = 3135

UpdatesLost = 12

UpdatesHistory = "0x00000000000000000000000000000000"

 

As you can see it's a 1 slot machine running Windows XP. And there are

*lots* of attributes you can match against. You just need to look at the

machines you want to match and figure out what the best set of

attributes are to use and the correct Boolean _expression_ you need to

write to match the machines.

 

I'll let you mull that over. Reply to this message tomorrow with what

you think the requirements _expression_ should be and I'll let you know if

you're hot or cold. Conveniently, you can always try it out on your

farm. You can use the -constraint option on condor_status to test an

_expression_. For example:

 

condor_status -const 'OpSys == "WINNT52" && (SlotID == 3 || SlotID ==

4)'

 

Would show you all machines that are running WinXP SP3 and have slots

numbered 3 or 4.

 

And once your jobs are queued you can use:

 

condor_q -better-analyze <cluster>.<proc>

 

And it'll show you which parts of your job's requirements expressions

are preventing you from matching machines.

 

Hope that helps take you to the next level with Condor!

 

- Ian

 

Confidentiality Notice.

This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.

 

_______________________________________________

Condor-users mailing list

To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a

subject: Unsubscribe

You can also unsubscribe by visiting

https://lists.cs.wisc.edu/mailman/listinfo/condor-users

 

The archives can be found at:

https://lists.cs.wisc.edu/archive/condor-users/