Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Why isn't the negotiation cycle finding this job?

Date: Wed, 09 Nov 2011 11:15:48 -0600
From: Joe Boyd <boyd@xxxxxxxx>
Subject: [Condor-users] Why isn't the negotiation cycle finding this job?

Ever since we installed condor 7.6 (upgraded from 7.4) the new negotiationalgorithm when you have groups enabled has been biting us. I have this ticketopen which I'm hoping to get a response to:


http://www.cs.wisc.edu/condor/fermi-tickets/22715.html

that deals only with not running the jobs we expect to run but at least theslots are staying full.

Now I see we aren't running our monitoring jobs which use the glideinwmsmonitoring slot. Here is the monitoring job which is targeted at the monitoringslot for a specific job (classads below):


Output from condor_q

1543992.0   willis         11/9  11:00   0+00:00:00 I  0   0.0  mon.sh

From the negotiator log with full_debug

11/09/11 11:03:00 ---------- Started Negotiation Cycle ----------
11/09/11 11:03:00 Phase 1:  Obtaining ads from collector ...
11/09/11 11:03:00   Getting all public ads ...
11/09/11 11:03:00 Trying to query collector <131.225.240.215:9618>
11/09/11 11:03:08   Sorting 8584 ads ...
<snip>
11/09/11 11:03:08 Ignoring submitter willis@xxxxxxxx with no requested jobs

The classad of the job

[cdfcaf@fcdfhead10 /export/condor_local/log] condor_q -nameschedd_3@xxxxxxxxxxxxxxxxxxx -l 1543992.0



-- Schedd: schedd_3@xxxxxxxxxxxxxxxxxxx : <131.225.240.215:50394>
PeriodicRemove = ( CurrentTime > 1320858524 )
CommittedSlotTime = 0
Out = "_condor_stdout"
ImageSize_RAW = 1
NumCkpts_RAW = 0

AutoClusterAttrs ="CAFGroup,CAFAcctGroup,CAF_DEFAULT_START,GLIDEIN_Is_Monitor,CAFDH"

EnteredCurrentStatus = 1320858014
CommittedSuspensionTime = 0
WhenToTransferOutput = "ON_EXIT"
NumSystemHolds = 0
StreamOut = false
NumRestarts = 0
ImageSize = 1
Cmd = "/tmp/glidein_intmon_HzIdSU/mon.sh"
x509UserProxyVOName = "cdf"
CurrentHosts = 0
Iwd = "/tmp/glidein_intmon_HzIdSU"
CumulativeSlotTime = 0
ExecutableSize_RAW = 1
CondorVersion = "$CondorVersion: 7.6.2 Jul 14 2011 BuildID: 351672 $"
RemoteUserCpu = 0.0
NumCkpts = 0
JobStatus = 1
Arguments = ""
RemoteSysCpu = 0.0
OnExitRemove = true
BufferBlockSize = 32768
ClusterId = 1543992
In = "/dev/null"
LocalUserCpu = 0.0

x509UserProxyFQAN ="/DC=gov/DC=fnal/O=Fermilab/OU=Robots/CN=glidecaf/CN=cdf/CN=Willis K.Sakumoto/CN=UID:willis,/cdf/Role=NULL/Capability=NULL"

MinHosts = 1
Environment = ""
JobUniverse = 5
RequestDisk = DiskUsage
RootDir = "/"
NumJobStarts = 0
WantRemoteIO = true

RequestMemory = ceiling(ifThenElse(JobVMMemory =!=undefined,JobVMMemory,ImageSize / 1024.000000))

GlobalJobId = "schedd_3@xxxxxxxxxxxxxxxxxxx#1543992.0#1320858014"
x509UserProxyFirstFQAN = "/cdf/Role=NULL/Capability=NULL"
LocalSysCpu = 0.0
PeriodicRelease = false
DiskUsage = 1
CumulativeSuspensionTime = 0
JobLeaseDuration = 1200
UserLog = "/tmp/glidein_intmon_HzIdSU/mon.log"
GLIDEIN_Is_Monitor = true
ExecutableSize = 1
MaxHosts = 1
ServerTime = 1320858260
CoreSize = 0
DiskUsage_RAW = 1
ProcId = 0
TransferFiles = "ONEXIT"
ShouldTransferFiles = "YES"
CommittedTime = 0
TotalSuspensions = 0
Err = "_condor_stderr"

x509userproxysubject ="/DC=gov/DC=fnal/O=Fermilab/OU=Robots/CN=glidecaf/CN=cdf/CN=Willis K.Sakumoto/CN=UID:willis"

AutoClusterId = 496
RequestCpus = 1
StreamErr = false
x509UserProxyExpiration = 1321256898
NiceUser = false
RemoteWallClockTime = 0.0
TargetType = "Machine"

TransferOutputRemaps ="_condor_stdout=/tmp/glidein_intmon_HzIdSU/mon.out;_condor_stderr=/tmp/glidein_intmon_HzIdSU/mon.err"

PeriodicHold = false
QDate = 1320858014
OnExitHold = false
Rank = 0.0
ExitBySignal = false
CondorPlatform = "$CondorPlatform: x86_64_rhap_5 $"
JobPrio = 0
LastSuspensionTime = 0
CurrentTime = time()
User = "willis@xxxxxxxx"
x509userproxy = "/export/CafCondor/tickets/x509cc_willis"
JobNotification = 0
BufferSize = 524288
WantRemoteSyscalls = false
LeaveJobInQueue = false
ExitStatus = 0
CompletionDate = 0
MyType = "Job"

Requirements = ( ( Name =?= "monitor_30769@xxxxxxxxxxxxxxxxxxxx" ) && ( Arch =!="Absurd" ) ) && ( ( Memory >= 1 ) ) && ( TARGET.OpSys == "LINUX" ) && (TARGET.Disk >= DiskUsage ) && ( ( RequestMemory * 1024 ) >= ImageSize ) && (TARGET.HasFileTransfer )

WantCheckpoint = false
Owner = "willis"
LastJobStatus = 0
TransferIn = false


The slot it wants is there

[cdfcaf@fcdfhead10 /export/condor_local/log] condor_status -constraint 'name =="monitor_30769@xxxxxxxxxxxxxxxxxxxx"'


Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

monitor_30769@fcdf LINUX      X86_64 Owner     Idle     5.870   393  0+23:01:13
                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX     1     1       0         0       0          0        0

               Total     1     1       0         0       0          0        0

The slot is free and not usable by anything else but this job won't run in the 8minutes allowed. It used to run on the next negotiation cycle because there isa slot sitting there free for it. Why does it say "with no requested jobs" forthe user "willis" when there is one in the queue?

I believe it has to do with the way that now all the slots are parcelled out togroups (even though not all jobs are in groups because jobs not in groupsgetting added to a <none> group) and we have this set:


GROUP_ACCEPT_SURPLUS = True

I'll keep digging but I'm hoping someone has advice.

Thanks,

joe

Follow-Ups:
- Re: [Condor-users] Why isn't the negotiation cycle finding this job?
  - From: Joe Boyd

Prev by Date: Re: [Condor-users] How to filter out hyperthreading CPU cores
Next by Date: Re: [Condor-users] Why isn't the negotiation cycle finding this job?
Previous by thread: Re: [Condor-users] How to filter out hyperthreading CPU cores
Next by thread: Re: [Condor-users] Why isn't the negotiation cycle finding this job?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] Why isn't the negotiation cycle finding this job?