[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CE: jobs without owner according to LRMS Condor



Hi Brian,

you are right - the `'no owner' message was a red herring.

The issue seems to have been a double group matching. I had been looking
into ops jobs, that came in under a specific VO DN (CMS) and are
nominally remapped in a CE route into a prioritized accounting group, so
that they get precedence before all other prod jobs etc.
Unfortunately, there was still a Condor config deployed, that applies an
AssignAccountingGroup map on the previously assigned user's group and
effectively resets the CE's prio route. So, these allegedly prioritized
jobs then had to compete again with their VOs production jobs :-/
i.e., the assumed 'non-matching' was actually a userprio issue within
the group's shares...
I am going to fix the group assignment.

Cheers and sorry for the noise,
  Thomas

On 26/05/2022 15.46, Brian Lin wrote:
> Hi Thomas,
> 
> TJ and I were troubleshooting a similar issue a few weeks ago and it
> turns out that the "Cluster XXX has no Owner attribute. Ignoring..." is
> actually spurious. The Gridmanager sets the proc ad's Owner attribute
> but doesn't actually set it in the cluster ad (you can see this in the
> job queue log): the dev team will be fixing this but it shouldn't
> actually have any effect on your jobs matching.
> 
> If jobs still aren't matching, then you'll need to start digging with
> 'condor_q -better' and in the NegotiatorLog.
> 
> - Brian
> 
> On 5/24/22 11:06, Thomas Hartmann wrote:
>> Hi all,
>>
>> has somebody also observed jobs, that allegedly(?) have no owner and are
>> thus skipped by the sched?
>>
>> We have currently two jobs, that arrived today morning on one of our CEs
>> Â and got routed to Condor [1]. The Condor scheduler however has been
>> ignoring these jobs insisting that they do not have an owner [2]. Oddly,
>> when querying the scheduler directly, these jobs look OK regarding the
>> owner.
>>
>> Installed packages look like [3] on the CE.
>>
>> Maybe somebody has an idea, what might be happening here?
>>
>> Cheers,
>> ÂÂ Thomas
>>
>> [1]
>>> condor_ce_q 2594831 -af Owner RoutedToJobID
>> sgmcms 6828787.0
>>> condor_q 6828787.0 -af Owner RoutedFromJobId
>> sgmcms 2594831.0
>>
>> [2]
>>> grep 6828787 condor/*Log
>> condor/SchedLog:05/24/22 17:37:43 (pid:2172) Cluster 6828787 has no
>> Owner attribute. Ignoring...
>> condor/SchedLog:05/24/22 17:37:49 (pid:2172) Cluster 6828787 has no
>> Owner attribute. Ignoring...
>> condor/SchedLog:05/24/22 17:38:24 (pid:2172) Cluster 6828787 has no
>> Owner attribute. Ignoring...
>> condor/SchedLog:05/24/22 17:38:30 (pid:2172) Cluster 6828787 has no
>> Owner attribute. Ignoring...
>>
>>
>> [3]
>> condor-9.0.11-1.el7.x86_64
>> condor-classads-9.0.11-1.el7.x86_64
>> condor-externals-9.0.11-1.el7.x86_64
>> condor-procd-9.0.11-1.el7.x86_64
>> htcondor-ce-5.1.3-1.el7.noarch
>> htcondor-ce-apel-5.1.3-1.el7.noarch
>> htcondor-ce-bdii-5.1.3-1.el7.noarch
>> htcondor-ce-client-5.1.3-1.el7.noarch
>> htcondor-ce-condor-5.1.3-1.el7.noarch
>> htcondor-ce-view-5.1.3-1.el7.noarch
>> python2-condor-9.0.11-1.el7.x86_64
>> python3-condor-9.0.11-1.el7.x86_64
>>
>>
>> on CentOS Linux release 7.9.2009 (Core) with 3.10.0-1160.62.1.el7.x86_64
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>> with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> 

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature