[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] DAGman jobs failing custom requirements



Yes, it's in APPEND_REQUIREMENTS in condor_config.
I'm going to try moving it to APPEND_REQ_VANILLA and APPEND_REQ_STANDARD to see if that helps, hopefully it shouldn't affect jobs running under the scheduler universe then.

I saw a post from 2 years ago that said "In the scheduler universe there is no way for any daemon to evaluate those requirements anyway as far as I know
because there is no matchmaking that goes on"
https://lists.cs.wisc.edu/archive/htcondor-users/2011-February/msg00005.shtml
But according to the Sched log Requirements are failing so they must be getting checked.

demo.dag.lib.out and demo.dag.lib.err are zero size as the job never gets run.
Here's the other stuff:

illustrious$ condor_q -l 7920.0

-- Submitter: inbfop05.agresearch.co.nz : <147.158.130.213:51103> : inbfop05.agresearch.co.nz
BufferSize = 524288
NiceUser = false
CoreSize = 0
CumulativeSlotTime = 0
OnExitHold = false
GlobalJobId = "inbfop05.agresearch.co.nz#7920.0#1359065012"
RequestCpus = 1
Err = "demo.dag.lib.err"
BufferBlockSize = 32768
ImageSize = 275
CurrentTime = time()
EnvDelim = ";"
WantCheckpoint = false
CommittedTime = 0
TargetType = "Machine"
WhenToTransferOutput = "ON_EXIT"
ServerTime = 1359065122
Cmd = "/usr/bin/condor_dagman"
JobUniverse = 7
ExitBySignal = false
TransferIn = false
Iwd = "/home/smithiesr/condor/dag_test"
NumRestarts = 0
CommittedSuspensionTime = 0
Site = "invermay"
Owner = "smithiesr"
NumSystemHolds = 0
CumulativeSuspensionTime = 0
RequestDisk = DiskUsage
Requirements = ( ( TARGET.Site == MY.Site ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory )
MinHosts = 1
JobNotification = 2
NumCkpts = 0
LastSuspensionTime = 0
NumJobStarts = 0
WantRemoteSyscalls = false
ImageSize_RAW = 274
JobPrio = 0
RootDir = "/"
CurrentHosts = 0
StreamOut = false
WantRemoteIO = true
DiskUsage_RAW = 274
OnExitRemove = ( ExitSignal =?= 11 || ( ExitCode =!= undefined && ExitCode >= 0 && ExitCode <= 2 ) )
OtherJobRemoveRequirements = "DAGManJobId == 7920"
DiskUsage = 275
In = "/dev/null"
PeriodicRemove = false
RemoteUserCpu = 0.0
LocalUserCpu = 0.0
ExecutableSize = 275
RemoteSysCpu = 0.0
LocalSysCpu = 0.0
ClusterId = 7920
CompletionDate = 0
RemoteWallClockTime = 0.0
Rank = ( 64 / CPUs )
LeaveJobInQueue = false
RemoveKillSig = "SIGUSR1"
MyType = "Job"
KillSig = "SIGTERM"
CondorVersion = "$CondorVersion: 7.8.2 Aug 08 2012 $"
NumCkpts_RAW = 0
StreamErr = false
ProcId = 0
PeriodicHold = false
User = "smithiesr@xxxxxxxxxxxxxxxx"
FileSystemDomain = "agresearch.co.nz"
LastJobStatus = 0
Arguments = "-f -l . -Lockfile demo.dag.lock -AutoRescue 1 -DoRescueFrom 0 -Dag demo.dag -CsdVersion $CondorVersion:' '7.8.2' 'Aug' '08' '2012' '$ -Dagman /usr/bin/condor_dagman"
Out = "demo.dag.lib.out"
JobStatus = 1
UserLog = "/home/smithiesr/condor/dag_test/demo.dag.dagman.log"
ExecutableSize_RAW = 274
PeriodicRelease = false
RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,( ImageSize + 1023 ) / 1024)
MaxHosts = 1
TotalSuspensions = 0
CommittedSlotTime = 0
Env = "PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/local/agr-scripts:/home/smithiesr/bin;_CONDOR_DAGMAN_LOG=demo.dag.dagman.out;MAIL=/var/spool/mail/smithiesr;CVS_RSH=ssh;LANG=en_US.UTF-8;HISTFILESIZE=1000;SSH_CONNECTION=147.158.129.160 53072 147.158.130.213 22;MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles;XAUTHORITY=/home/smithiesr/.Xauthority;SSH_CLIENT=147.158.129.160 53072 22;SHELL=/bin/bash;_=/usr/bin/condor_submit_dag;PWD=/home/smithiesr/condor/dag_test;SSH_TTY=/dev/pts/0;SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass;HISTSIZE=1000;USER=smithiesr;LOADEDMODULES=;G_BROKEN_FILENAMES=1;LESSOPEN=|/usr/bin/lesspipe.sh %s;MODULESHOME=/usr/share/Modules;_CONDOR_MAX_DAGMAN_LOG=0;HISTCONTROL=ignoredups;KRB5CCNAME=FILE:/tmp/krb5cc_509_ERqeyh;SHLVL=1;HOSTNAME=inbfop05.agresearch.co.nz;GREP_OPTIONS=--color=auto;CEGMA=/usr/lib64/cegma;WISECONFIGDIR=/usr/share/wise2/;HOME=/home/smithiesr;HISTTIMEFORMAT=%h/%d - %H:%M:%S ;TERM=xterm;OLDPWD=/home/smithiesr/hg/puppet/modules/condor/templates;VISUAL=nano;LOGNAME=smithiesr"
CondorPlatform = "$CondorPlatform: x86_64_rhap_6.3 $"
ShouldTransferFiles = "IF_NEEDED"
ExitStatus = 0
QDate = 1359065012
EnteredCurrentStatus = 1359065012




--Russell

-----Original Message-----
From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of R. Kent Wenger
Sent: Friday, 25 January 2013 10:50 a.m.
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] DAGman jobs failing custom requirements

On Fri, 25 Jan 2013, Smithies, Russell wrote:

> We're just starting out getting dagman jobs working and have run into a small problem.
> Our normal condor_submit jobs work OK, and when I run each individual job in the DAG it works OK, but when I submit the whole DAG the job doesn't run. The Sched log says it's failed requirements e.g. "The Requirements attribute for job 7918.0 did not evaluate. Unable to start job" and the job just sits in the queue idle.
> If I condor_qedit the requirements and remove the first arg (which is TARGET.Site == MY.Site ) then the DAG runs to completion.
> We have this extra 'Site" attribute as we're geographically distributed and it's best to have the users running their jobs locally for better file IO. This is set in each servers condor_config.

Is 'TARGET.Site == MY.Site' set in APPEND_REQUIREMENTS in your configuration?

I don't know offhand why things should be any different with DAGMan -- DAGMan just runs condor_submit to actually submit the job.

A couple of things that should help diagnose it:

* Your dagman.out file -- I'm interested to see what arguments DAGMan is passing to condor_submit, and if anything strange is going on there.

* The output of 'condor_q -l' for a job inside and outside of the DAG.
(Or 'condor_history -l' if the job finished before you got a chance to run condor_q.)

Kent Wenger
CHTC Team
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================