[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Failed to send REQUEST_CLAIM to startd



Hello,

In your configuration with CCB, it sounds like you have the StartdDs behind a firewall.  So then the Collector and the SchedD must be running on "public" IPs.  That is, the StartD can connect directly to the Collector and also to the SchedD.  Is that correct?

If so, you want your settings for CCB:
    CCB_ADDRESS = $(COLLECTOR_HOST)    
    PRIVATE_NETWORK_NAME = htcondor

to be configured on the StartD machines.  But you do not want to set those on the SchedD machine.  If two machines have the same PRIVATE_NETWORK_NAME, then they will try to connect directly without CCB.  So perhaps that is something to double check.

See also this section on "Troubleshooting CCB" to learn more about logging and what to look for:
    https://htcondor.readthedocs.io/en/latest/admin-manual/networking.html#troubleshooting-ccb


Cheers,
-zach

    
    
    Maybe I didnât point some setting to the SUBMIT node. How can I configure the schedd daemon? I want the schedd not to connect directly to the startd daemon, but let it use a collector and the CCB to submit job.
    Please could you tell me is it possible?
    
    
    
    Thanks in advance!
    
    
    ÑÐ, 21 ÐÐÐ. 2019 Ð. Ð 00:05, Zach Miller <zmiller@xxxxxxxxxxx>:
    
    
    Ivan,
    
    This line from your log seems to be the key:
        condor_schedd[8501]: attempt to connect to <10.7.128.15:49430 <http://10.7.128.15:49430>> failed: Connection refused (connect errno = 111).
    
    The network would not allow the connection to happen to that IP and port.  Could it be a firewall/iptables type of issue?
    
    
    Cheers,
    -zach
    
    
    On 12/20/19, 6:34 AM, "HTCondor-users on behalf of don_vanchos" <htcondor-users-bounces@xxxxxxxxxxx on behalf of
    hozblok@xxxxxxxxx> wrote:
    
        Hello!
    
    
        I'm trying to submit a simple vanilla task. But the task does not start. Please, could you explain this error to me?
    
    
    
        condor_schedd[8501]: Finished negotiating for s_user in local pool: 1 matched, 1 rejected
        condor_schedd[8501]: attempt to connect to <10.7.128.15:49430 <http://10.7.128.15:49430> <http://10.7.128.15:49430>> failed: Connection
     refused (connect errno = 111).
        condor_schedd[8501]: Failed to send REQUEST_CLAIM to startd slot1@w7-demo15 <10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote <http://10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote>
     <http://10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote>> for s_user: SECMAN:2003:TCP connection
         to startd slot1@w7-demo15 <10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote <http://10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote> <http://10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote>>
     for s_user failed.
        condor_schedd[8501]: Match record (slot1@w7-demo15 <10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote <http://10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote> <http://10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote>>
     for s_user, 8.0) deleted
    
    
        How to investigate the causes of the problem?
    
    
    
        Thanks in advance!
    
    
    
        P.S. #condor_q -better-analyze -verbose -allusers
    
    
        The Requirements expression for job 8.000 reduces to these conditions:
    
                 Slots
        Step    Matched  Condition
        -----  --------  ---------
        [0]           8  OpSys == "WINDOWS"
        [1]           8  TARGET.Arch == "X86_64"
        [3]           8  TARGET.Disk >= RequestDisk
        [5]           8  TARGET.Memory >= RequestMemory
        [8]           8  TARGET.HasFileTransfer
    
    
        008.000:  Job has not yet been considered by the matchmaker.
    
    
        008.000:  Run analysis summary ignoring user priority.  Of 8 machines,
              0 are rejected by your job's requirements
              0 reject your job because of their own requirements
              0 match and are already running your jobs
              0 match but are serving other users
              8 are able to run your job
    
    
    
        -- 
        Sincerely yours,
        Ivan Ergunov                                                 mailto:hozblok@xxxxxxxxx
    
    
    
    
    _______________________________________________
    HTCondor-users mailing list
    To unsubscribe, send a message to 
    htcondor-users-request@xxxxxxxxxxx <mailto:htcondor-users-request@xxxxxxxxxxx> with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
    
    The archives can be found at:
    https://lists.cs.wisc.edu/archive/htcondor-users/
    
    
    
    
    
    -- 
    Sincerely yours,
    Ivan Ergunov                                                 mailto:hozblok@xxxxxxxxx