[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Fix for the possibly bogus "ceiling(ifThenElse(JobVMMemory ..." requirement issue



On 10/19/2011 11:34 PM, Kevin.Buckley@xxxxxxxxxxxxx wrote:
Whist the Condor compute nodes seem to start up fine, removing
some problems that had been seen before on the win7 platform,
even the simplest of jobs, pass a .BAT file over and run it,
seem to fail to find  a host, because, so condor_q -better_analyze
suggests of this:

     Condition                         Machines Matched    Suggestion
     ---------                         ----------------    ----------
1   ( ( 1024 * ceiling(ifThenElse(JobVMMemory isnt
undefined,JobVMMemory,9.765625000000000E-04)) )>= 1 )
                                       0                   REMOVE

It turns out that the above "Condition" was not what was stopping
tasks from running on the node in question, so in case it sheds
(scheds?) any light in why that condition appears in the first
place, the issue here was that the /master/submit host/ had been
left out of the ALLOW_WRITE specification and was thus failing to
achieve the

   command 442 (REQUEST_CLAIM)

on the compute node as it tried to move from "Matched" to "Claimed".

Is that likely to be one of the causes of the "Condition" appearing ?

That's the thing, it's always a false alarm. This also only appears to be an issue on Windows. Will you run the following and report the output?

0) condor_status -collector -format "%d\n" 'ceiling(0.0009765625)'
1) condor_status -collector -format "%d\n" 'ceiling(9.765625E-04)'
2) condor_status -collector -format "%d\n" 'ceiling(0.1)'
3) condor_q -bet -jobads job.ad -machineads slot.ad

slot.ad:
Requirements = True

job.ad:
Owner = "matt"
JobUniverse = 5
LastRejMatchTime = time()
ImageSize = 31
Requirements = ( ( ceiling(ImageSize / 1024.000000) * 1024 ) >= ImageSize )

FYI, 1 / 1024 == 9.765625e-4. My long standing theory is some sort of precision issue or difference in ceiling() between platforms, but haven't bothered to test it yet.

Best,


matt