[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Why condor does not assign task to idle workers?



condor_q:
wukan@iZbp10rdm0o11ggqepd6fjZ:~/cloudwms$ condor_q


-- Schedd:  : <172.11.0.1:9618?... @ 01/07/20 09:59:09
OWNER BATCH_NAME                                                    SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
wukan Administrator_test_pro_pipline_1578361131-0.dag+1   1/7  09:38      4      _       1     12 6.0

Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended 
Total for wukan: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended 
Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
condor_status:
wukan@iZbp10rdm0o11ggqepd6fjZ:~/cloudwms$ condor_status
Name                    OpSys      Arch   State     Activity LoadAv Mem    ActvtyTime

iZbp1beg1afvh6u4ix5k9iZ LINUX      X86_64 Unclaimed Idle      0.000 64427  0+00:14:43

                     Total Owner Claimed Unclaimed Matched Preempting Backfill  Drain
-----------------------------------------------------------------------------------------------------------------------------------
there is one idle worker,but condor doest not assgin task to the idle worker.
the master ip is:172.25.50.207

when i view the log file on master machine:/var/log/condor/SchedLog,the log occur error below:
01/07/20 09:59:31 (pid:1812) Failed to send REQUEST_CLAIM to startd iZbp1beg1afvh6u4ix5k9iZ <172.11.0.1:9618?addrs=172.11.0.1-9618&noUDP&sock=5712_d975_3> for wukan: SECMAN:2007:Failed to end classad message.
01/07/20 09:59:31 (pid:1812) Match record (iZbp1beg1afvh6u4ix5k9iZ <172.11.0.1:9618?addrs=172.11.0.1-9618&noUDP&sock=5712_d975_3> for wukan, 6.0) deleted
01/07/20 10:00:31 (pid:1812) Activity on stashed negotiator socket: <172.25.50.207:26462>
01/07/20 10:00:31 (pid:1812) Using negotiation protocol: NEGOTIATE
01/07/20 10:00:31 (pid:1812) Negotiating for owner: wukan@xxxxxxxxxxxxx
01/07/20 10:00:31 (pid:1812) SECMAN: removing lingering non-negotiated security session <172.11.0.1:9618>#1578361425#1 because it conflicts with new request
01/07/20 10:00:31 (pid:1812) Finished negotiating for wukan in local pool: 1 matched, 0 rejected
01/07/20 10:00:31 (pid:1812) condor_write(): Socket closed when trying to write 638 bytes to startd iZbp1beg1afvh6u4ix5k9iZ <172.11.0.1:9618?addrs=172.11.0.1-9618&noUDP&sock=5712_d975_3> for wukan, fd is 16
01/07/20 10:00:31 (pid:1812) Buf::write(): condor_write() failed
01/07/20 10:00:31 (pid:1812) SECMAN: failed to end classad message
01/07/20 10:00:31 (pid:1812) Failed to send REQUEST_CLAIM to startd iZbp1beg1afvh6u4ix5k9iZ <172.11.0.1:9618?addrs=172.11.0.1-9618&noUDP&sock=5712_d975_3> for wukan: SECMAN:2007:Failed to end classad message.
01/07/20 10:00:31 (pid:1812) Match record (iZbp1beg1afvh6u4ix5k9iZ <172.11.0.1:9618?addrs=172.11.0.1-9618&noUDP&sock=5712_d975_3> for wukan, 6.0) deleted

why the sched connect to  the ip 172.11.0.1,this is the master docker route ip:
wukan@iZbp10rdm0o11ggqepd6fjZ:~/cloudwms$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         172.25.255.253  0.0.0.0         UG    0      0        0 eth0
172.11.0.0      *               255.255.0.0     U     0      0        0 docker0
172.25.0.0      *               255.255.0.0     U     0      0        0 eth0

wukan@iZbp1beg1afvh6u4ix5k9iZ:~$ sudo cat /etc/docker/daemon.json
{
"bip":"172.11.0.1/16"
}
what's the problem maybe?