[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] "Failed to receive remote ad" runtime error when querying history with the python api



Yes. that seems likely.

 

 

From: Biruk Mammo [mailto:birukw@xxxxxxxxxx]
Sent: Monday, March 12, 2018 1:07 PM
To: John M Knoeller <johnkn@xxxxxxxxxxx>
Cc: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] "Failed to receive remote ad" runtime error when querying history with the python api

 

Aha, thanks John!

 

I have no map file configured. The scheduler's configuration is as follows:

 

ALLOW_WRITE = $(ALLOW_WRITE), $(CONDOR_HOST)

CONDOR_HOST = condor-master

DAEMON_LIST = MASTER, SCHEDD

DISCARD_SESSION_KEYRING_ON_STARTUP = False

UID_DOMAIN = *

TRUST_UID_DOMAIN = True

 

Is the UID_DOMAIN setting the culprit?

 

On Mon, Mar 12, 2018 at 9:56 AM John M Knoeller <johnkn@xxxxxxxxxxx> wrote:

Yes. the problem is here

myusername@*

 

The * here should be a domain name.  Because it is a * instead, and * is used as a token separator,  the remainder isnât being parsed correctly.

(more specifically, there should only be one * between the username and the condor version string)

 

So, something odd is going on in the SCHEDD when it authenticates.  Do you have a map file?

 

-tj

 

 

From: Biruk Mammo [mailto:birukw@xxxxxxxxxx]
Sent: Saturday, March 10, 2018 10:00 PM
To: htcondor-users@xxxxxxxxxxx; John M Knoeller <johnkn@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] "Failed to receive remote ad" runtime error when querying history with the python api

 

Hi John, hope you had a chance to look at this.

 

On Wed, Feb 28, 2018 at 1:26 PM Biruk Mammo <birukw@xxxxxxxxxx> wrote:

Here is the full log line:

condor_history: getInheritedSocks from CONDOR_INHERIT is '60562 <10.2.0.9:18316> 1 17*3*15*1*8*51*myusername@**$CondorVersion:_8.7.6_Jan_04_2018_BuildID:_428319_$*0*<10.

2.0.9:25777>*48*2*0*9CEBCCEB79FAB9851039EDEAF169AC16C98AC4C827A7CA5A*0* 0 0'

 

 

On Wed, Feb 28, 2018 at 8:58 AM John M Knoeller <johnkn@xxxxxxxxxxx> wrote:

could you please send me the [REDACTED] bit from this ToolLog message?

condor_history: getInheritedSocks from CONDOR_INHERIT is ... [REDACTED]

 

The error indicates that the actual contents of that is incorrectly formatted.

 

thanks

-tj

 

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Biruk Mammo via HTCondor-users
Sent: Tuesday, February 27, 2018 5:11 PM
To: htcondor-users@xxxxxxxxxxx
Cc: Biruk Mammo <birukw@xxxxxxxxxx>
Subject: [HTCondor-users] "Failed to receive remote ad" runtime error when querying history with the python api

 

Hello HTCondor users,

 

I get a "Failed to receive remote ad" error when using the Python bindings to query history immediately after submitting a job. Looking into the HTCondor logs, I see the following error in ToolLog:

 

[Timestamp] condor_history: getInheritedSocks from CONDOR_INHERIT is ... [REDACTED]

[Timestamp] ERROR "Assertion ERROR on (*ptmp == '*')" at line 2244 in file /slots/10/dir_3701941/userdir/.tmplMkQ9O/BUILD/condor-8.7.6/src/condor_io/sock.cpp

 

I also see a core dump in the log directory.

 

This error does not occur if I wait a few seconds before invoking schedd.history. Also, there is no error if I run the history query without submitting a job.

 

Below is the Python code that triggers the problem.

 

import htcondor

submit = htcondor.Submit({'executable': '/usr/bin/sleep', 'arguments': '300'})

schedd = htcondor.Schedd()

with schedd.transaction() as txn:

  print submit.queue(txn)

print list(schedd.history('true', ['ClusterId'], 10))

# RuntimeError: Failed to receive remote ad.

 

Is there something I am missing? Thanks in advance for your help!

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/