[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Failed to send REQUEST_CLAIM to startd



Hi Zach,

Yes, this is a firewall. There is no direct network visibility from the schedd daemon to the startd daemon.
I use a shared port and CCB mechanism so the startd and the schedd daemons successfully connect to the collector.ÂBut jobs still can't be submitted. My settings are

USE_SHARED_PORT = TRUE
SHARED_PORT_PORT = 9618
UPDATE_COLLECTOR_WITH_TCP = TRUE
BIND_ALL_INTERFACES = TRUE
CCB_ADDRESS = $(COLLECTOR_HOST)
PRIVATE_NETWORK_NAME = htcondor

Maybe I didnât point some setting toÂthe SUBMIT node. How can I configure the schedd daemon? I want the schedd not to connect directly to the startd daemon, but let it use a collector and the CCB to submit job.
Please could you tell me is it possible?

Thanks in advance!

ÑÐ, 21 ÐÐÐ. 2019 Ð. Ð 00:05, Zach Miller <zmiller@xxxxxxxxxxx>:
Ivan,

This line from your log seems to be the key:
  condor_schedd[8501]: attempt to connect to <10.7.128.15:49430> failed: Connection refused (connect errno = 111).

The network would not allow the connection to happen to that IP and port. Could it be a firewall/iptables type of issue?


Cheers,
-zach


ïOn 12/20/19, 6:34 AM, "HTCondor-users on behalf of don_vanchos" <htcondor-users-bounces@xxxxxxxxxxx on behalf of hozblok@xxxxxxxxx> wrote:

  Hello!


  I'm trying to submit a simple vanilla task. But the task does not start. Please, could you explain this error to me?



  condor_schedd[8501]: Finished negotiating for s_user in local pool: 1 matched, 1 rejected
  condor_schedd[8501]: attempt to connect to <10.7.128.15:49430 <http://10.7.128.15:49430>> failed: Connection refused (connect errno = 111).
  condor_schedd[8501]: Failed to send REQUEST_CLAIM to startd slot1@w7-demo15 <10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote <http://10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote>> for s_user: SECMAN:2003:TCP connection
  Âto startd slot1@w7-demo15 <10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote <http://10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote>> for s_user failed.
  condor_schedd[8501]: Match record (slot1@w7-demo15 <10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote <http://10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote>> for s_user, 8.0) deleted


  How to investigate the causes of the problem?



  Thanks in advance!



  P.S. #condor_q -better-analyze -verbose -allusers


  The Requirements _expression_ for job 8.000 reduces to these conditions:

      ÂSlots
  Step  Matched Condition
  ----- -------- ---------
  [0]     Â8 OpSys == "WINDOWS"
  [1]     Â8 TARGET.Arch == "X86_64"
  [3]     Â8 TARGET.Disk >= RequestDisk
  [5]     Â8 TARGET.Memory >= RequestMemory
  [8]     Â8 TARGET.HasFileTransfer


  008.000: Job has not yet been considered by the matchmaker.


  008.000: Run analysis summary ignoring user priority. Of 8 machines,
     0 are rejected by your job's requirements
     0 reject your job because of their own requirements
     0 match and are already running your jobs
     0 match but are serving other users
     8 are able to run your job



  --
  Sincerely yours,
  Ivan Ergunov                        Âmailto:hozblok@xxxxxxxxx




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


--
Sincerely yours,
Ivan Ergunov                         mailto:hozblok@xxxxxxxxx