[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_q -better-analyze: "Could not fetch startd ads"



Hi Wes.

So the condor_schedd never fetches startd ads.   It gets handed them as match ads by the negotiator. 

condor_q gets job classads by quering the schedd, and it gets startd ads by querying the collector,
so this failure suggests that the collector is not accepting read connections from tools. 

try adding this to your configuration where you are running the condor_q command

    TOOL_DEBUG = D_COMMAND:1 D_FULLDEBUG D_HOSTNAME D_CAT $(TOOL_DEBUG)

then run

   condor_status -debug

Then look in the CollectorLog on the central manager for messages that have the same timestamp as the condor_status output

You should see messages relating to the condor_status query. 

if you run 

   condor_q -analyze

And then look in the CollectorLog again, you should see essentially the same messages as you saw with condor_status

Most likely you will see something about the command being rejected.  Once we know what is happening with the collector, we can go from there.   If you don't see anything in the collector from the condor_status and/or condor_q commands, then the problem is most likely outside of condor - usually a firewall issue.

-tj

-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Wesley Taylor
Sent: Friday, July 31, 2020 6:21 PM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] condor_q -better-analyze: "Could not fetch startd ads"

Hey all, its me again.

Finally got HTCondor ready for preliminary smoke testing on the production network, and have been debugging issues as we go. There has been one really weird error I haven't been able to figure out on my own. When I run "condor_q" it runs fine. However, if I run "condor_q -analyze" or "condor_q -better-analyze" I get back "Error: Could not fetch startd ads".

I would think this is a permissions issue, but what I don't understand is if the condor_schedd cannot fetch startd ads, how is it able to matchmake and run jobs? I have been looking through the logs and don't even know where to look for this error happening so I can try to address it.

Has anyone seen this behavior before?

Thank you,
Wes

Public Content
________________________________
The information contained in this e-mail and any attachments from Numerica Corporation may contain confidential and/or proprietary information, and is intended only for the named recipient to whom it was originally addressed. If you are not the intended recipient, any disclosure, distribution, or copying of this e-mail or its attachments is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by return e-mail and permanently delete the e-mail and any attachments.

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/