[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] ERROR PLZZ HELP OUT Could not fetch ads --- can't find collector



Lot of Thanks to  ALAIN ROY , Ive follwed your  sugestion , its almost recovered still, i have some probs

In both the client and head node i dont have startd running, even if i start condor_startd manually , its not running,

also see the status of the commands,

IN HEAD NODE


[root@ca1 ~]# ps -ef | egrep condor_
condor    4567     1  0 11:03 ?        00:00:00 condor_master
condor    4569  4567  0 11:03 ?        00:00:00 condor_schedd -f
condor    4594     1  0 11:03 ?        00:00:00 condor_collector
condor    4603     1  0 11:03 ?        00:00:00 condor_negotiator
condor    4619     1  0 11:04 ?        00:00:00 condor_schedd
root      4658  3773  0 11:04 pts/1    00:00:00 egrep condor_

[root@ca1 ~]# condor_q
-- Submitter: ca1.cdacgrid : <192.9.200.215:33073> : ca1.cdacgrid
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD

0 jobs; 0 idle, 0 running, 0 held


[root@ca1 ~]# condor_status
Error:  Could not fetch ads --- can't find collector

ALSO IN THE HEAD NODE Im NOT GETTING THE DETAILS OF THE CLIENT NODE,

MASTERLOG OF HEADNODE

3/8 11:04:12 The STARTD (pid 4622) exited with status 4
3/8 11:04:12 restarting /usr/local/condor/sbin/condor_startd in 25 seconds
3/8 11:04:12 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:04:37 Started DaemonCore process "/usr/local/condor/sbin/condor_startd", pid and pgroup = 4678
3/8 11:04:37 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:04:37 The STARTD (pid 4678) exited with status 4
3/8 11:04:37 restarting /usr/local/condor/sbin/condor_startd in 41 seconds
3/8 11:04:37 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:05:18 Started DaemonCore process "/usr/local/condor/sbin/condor_startd", pid and pgroup = 4681
3/8 11:05:18 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:05:18 The STARTD (pid 4681) exited with status 4
3/8 11:05:18 restarting /usr/local/condor/sbin/condor_startd in 73 seconds
3/8 11:05:18 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:06:31 Started DaemonCore process "/usr/local/condor/sbin/condor_startd", pid and pgroup = 4683
3/8 11:06:31 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:06:31 The STARTD (pid 4683) exited with status 4
3/8 11:06:31 restarting /usr/local/condor/sbin/condor_startd in 137 seconds
3/8 11:06:31 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:08:26 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:08:48 Started DaemonCore process "/usr/local/condor/sbin/condor_startd", pid and pgroup = 4686
3/8 11:08:48 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:08:48 The STARTD (pid 4686) exited with status 4
3/8 11:08:48 restarting /usr/local/condor/sbin/condor_startd in 265 seconds
3/8 11:08:48 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:13:13 Started DaemonCore process "/usr/local/condor/sbin/condor_startd", pid and pgroup = 4721
3/8 11:13:13 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:13:13 The STARTD (pid 4721) exited with status 4
3/8 11:13:13 restarting /usr/local/condor/sbin/condor_startd in 521 seconds
3/8 11:13:13 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:13:26 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:18:26 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:21:54 Started DaemonCore process "/usr/local/condor/sbin/condor_startd", pid and pgroup = 4761
3/8 11:21:54 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:21:54 The STARTD (pid 4761) exited with status 4
3/8 11:21:54 restarting /usr/local/condor/sbin/condor_startd in 1033 seconds
3/8 11:21:54 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:23:26 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:28:26 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:33:26 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector


THERS NO ERROR IN COLLECTORLOG


IN CLIENTNODE


[root@nodeA sbin]# ps -ef | egrep condor_
condor    4169     1  0 11:04 ?        00:00:00 condor_master
condor    4171  4169  0 11:04 ?        00:00:00 condor_schedd -f
root      4282  3924  0 11:06 pts/1    00:00:00 egrep condor_


[root@nodeA sbin]# condor_q

-- Submitter: nodeA.cdacgrid : <192.9.200.90:32774> : nodeA.cdacgrid
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD

0 jobs; 0 idle, 0 running, 0 held

[root@nodeA sbin]# condor_status
Error:  Could not fetch ads --- can't find collector
________________________________________________________________________________________________

shall i check condor_status after submitting job , moreover Im not getting status of clientnode in headnode,

plzz help out,
thanks,
lash