[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] 7.8.3: "rejected by your job requirements" again



Dimitry

Review the URL:
http://research.cs.wisc.edu/condor/manual/v7.8/condor_submit.html. This
should solve your requirements issues.

request_memory = quantity
The required amount of memory in Mbytes that this job needs to avoid
excessive swapping. If not specified and the submit command vm_memory is
specified, then the value specified for vm_memory defines
request_memory. If neither request_memory nor vm_memory is specified,
the value is set by the configuration variable JOB_DEFAULT_REQUESTMEMORY
. The actual amount of memory used by a job is represented by the job
ClassAd attribute MemoryUsage.
For pools that enable dynamic condor_startd provisioning (see section
3.12.8), a dynamic slot will be created with at least this much RAM.

The expression

  && (RequestMemory <= Target.Memory)
is appended to the requirements expression for the job.
Characters may be appended to a numerical value to indicate units. K or
KB indicates Kbytes. M or MB indicates Mbytes. G or GB indicates Gbytes.
T or TB indicates Tbytes.


-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Dimitri Maziuk
Sent: Wednesday, September 19, 2012 1:05 PM
To: Condor-Users Mail List
Subject: [Condor-users] 7.8.3: "rejected by your job requirements" again

Hi all,

after 7.8.3 update I have 4 jobs out of a dag of 8000+ stuck with:
---------------------------------------------
The Requirements expression for your job is:

( ( TARGET.Memory > 0 ) && ( .RIGHT.Memory > 0 ) ) && ( TARGET.Arch ==
"X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >=
RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && (
TARGET.FileSystemDomain == MY.FileSystemDomain )

    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   (
    [
    ].Memory > 0 )       0                   REMOVE
2   ( TARGET.Memory >= 7325 )         0                   MODIFY TO 1968
3   ( TARGET.Memory > 0 )             32
...
---------------------------------------------

Last time condor pulled this TARGET.Memory requirement out of the ether
I added "( TARGET.Memory > 0 ) && ( .RIGHT.Memory > 0 )" to job's submit
file. That worked until now.

The other change is I added another machine to the pool in the middle of
the run -- a 2x2 AMD, but stuck jobs are not on it.

What's curious this time all 4 jobs are stuck on one node and before
they got stuck a whole lot of jobs successfully ran to completion on
that node.

The jobs are BLAST sequence searches, execute nodes are all centos 6.3
x86_64 AMDs (2..8-core), the whole setup's been running weekly for
years.

Any suggestions?

--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu