[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Windows jobs not running after upgrade 7.2.4 to 7.4.3



Maybe some relevant info that might be useful.
 
Following are the different requirements generated by condor for the 2 versions
 
7.2.4
 
RequestMemory = ceiling(ImageSize / 1024.000000)
Requirements = (Arch == "INTEL") && (OpSys == "WINNT51") && (Disk >= DiskUsage)
&& ((Memory * 1024) >= ImageSize) && (HasFileTransfer)
 
7.4.3
 
RequestMemory = ceiling(ifThenElse(JobVMMemory =!= UNDEFINED, JobVMMemory, Image
Size / 1024.000000))
Requirements = (Arch == "INTEL") && (OpSys == "WINNT51") && (Disk >= DiskUsage)
&& (((Memory * 1024) >= ImageSize) && ((RequestMemory * 1024) >= ImageSize)) &&
(HasFileTransfer)
 
Cheers
 
Greg


From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Greg.Hitchen@xxxxxxxx
Sent: Tuesday, 31 August 2010 9:58 AM
To: condor-users@xxxxxxxxxxx
Subject: [ExternalEmail] [Condor-users] Windows jobs not running after upgrade 7.2.4 to 7.4.3

We have a user who is submitting the same jobs with the same requirements _expression_
that all worked with 7.2.4 but apparently are not now with 7.4.3
 
requirements = (POOL == "VIC") && (Kflops > 1200000)
 
The jobs sit idle and never execute because no machines match, as shown by condor_q -better-analyze
 
013.006:  Run analysis summary.  Of 1229 machines,
    371 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
    839 match but reject the job for unknown reasons
     19 match but will not currently preempt their existing job
      0 match but are currently offline
      0 are available to run your job
 
The Requirements _expression_ for your job is:
 
( ( target.POOL == "VIC" ) && ( target.Kflops > 1200000 ) &&
( target.OpSys == "WINNT51" ) ) && ( target.Arch == "INTEL" ) &&
( target.Disk >= DiskUsage ) && ( ( ( target.Memory * 1024 ) >= ImageSize ) &&
( ( RequestMemory * 1024 ) >= ImageSize ) ) && ( target.HasFileTransfer )
 
    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( ( ( 1024 * target.Memory ) >= 1250 ) && ( ( 1024 * ceiling(ifThenElse(JobVMMemory isnt undefined,JobVMMemory,1.220703125000000E+000)) ) >= 1250 ) )
                                      0                   REMOVE
2   ( target.Kflops > 1200000 )       988
3   ( target.POOL == "VIC" )          1071
4   ( target.OpSys == "WINNT51" )     1214
5   ( target.HasFileTransfer )        1223
6   ( target.Arch == "INTEL" )        1228
7   ( target.Disk >= 1500 )           1229
 
I've not taken a lot notice of the "extra" requirements that condor adds itself before but am
wondering about the "RequestMemory" requirement as googling seems to show that a bug
was fixed in 7.2 that made it 1024 times too big due to a mix up between Mb and Kb.
Could this still be a problem?
 
I've also added below the results from a condor_q -l command if that's at all relevant.
I'll keep looking into it but thought I'd try the users-list as well.
 
Thanks for any info
 
Cheers
 
Greg
 
Err = "cvferr_6.txt"
LastJobStatus = 0
Out = "cvfout_6.txt"
ProcId = 6
Shortjob = TRUE
UserLog = "C:\\Users\\pok008\\TUCA\\PredSel\\jfm\\cvflog_6.txt"
JobStatus = 1
GlobalJobId = "wan110a-hr.nexus.csiro.au#13.6#1283211977"
Args = "CRNTUL1_RF.txt climate.txt pred.txt TUCjfm006 6 0 1"
ServerTime = 1283219235
ClusterId = 13
CompletionDate = 0
NTDomain = "NEXUS"
WindowsMajorVersion = 6
WindowsMinorVersion = 0
WindowsBuildNumber = 6002
WindowsServicePackMajorVersion = 2
WindowsServicePackMinorVersion = 0
WindowsProductType = 3
LocalUserCpu = 0.000000
LocalSysCpu = 0.000000
RemoteSysCpu = 0.000000
ExitStatus = 0
NumCkpts_RAW = 0
NumCkpts = 0
NumJobStarts = 0
NumRestarts = 0
NumSystemHolds = 0
CommittedTime = 0
TotalSuspensions = 0
LastSuspensionTime = 0
CumulativeSuspensionTime = 0
ExitBySignal = FALSE
CondorVersion = "$CondorVersion: 7.4.3 Aug  4 2010 BuildID: 261829 $"
CondorPlatform = "$CondorPlatform: INTEL-WINNT50 $"
Iwd = "C:\\Users\\pok008\\TUCA\\PredSel\\jfm"
MinHosts = 1
MaxHosts = 1
CurrentHosts = 0
WantRemoteSyscalls = FALSE
WantCheckpoint = FALSE
RequestCpus = 1
EnteredCurrentStatus = 1283211977
User = "pok008@xxxxxxxx"
NiceUser = FALSE
Environment = ""
JobNotification = 3
WantRemoteIO = TRUE
CoreSize = 0
Rank = ConsoleIdle
In = "/dev/null"
TransferIn = FALSE
StreamOut = FALSE
StreamErr = FALSE
BufferSize = 524288
BufferBlockSize = 32768
ShouldTransferFiles = "YES"
WhenToTransferOutput = "ON_EXIT_OR_EVICT"
TransferFiles = "ALWAYS"
TransferInput = "CRNTUL1_RF.txt,climate.txt,pred.txt"
ImageSize_RAW = 1204
ExecutableSize_RAW = 1204
ExecutableSize = 1250
DiskUsage_RAW = 1374
DiskUsage = 1500
RequestMemory = ceiling(ifThenElse(JobVMMemory =!= UNDEFINED, JobVMMemory, Image
Size / 1024.000000))
RequestDisk = DiskUsage
Requirements = ((POOL == "VIC") && (Kflops > 1200000) && (OpSys == "WINNT51")) &
& (Arch == "INTEL") && (Disk >= DiskUsage) && (((Memory * 1024) >= ImageSize) &&
 ((RequestMemory * 1024) >= ImageSize)) && (HasFileTransfer)
JobLeaseDuration = 300
PeriodicHold = FALSE
PeriodicRelease = FALSE
PeriodicRemove = FALSE
> == 0)
LeaveJobInQueue = FALSE
Owner = "pok008"
JobPrio = 3000
ImageSize = 1250
QDate = 1283211977
RemoteUserCpu = 0
RemoteWallClockTime = 0
Cmd = "C:\\Users\\pok008\\cvcluster_select.exe"
JobUniverse = 5