[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Condor Jobs getting stuck in idle state. No valid starters were found!



Hi,

I am facing strange issue with workernodes. I have a setup like this: one machine acts as central manager and couple machines acts as worker nodes. We were using gluster file system as share file system. Due to some technical issues, we migrated from gfs1 to gfs2 (two different servers). From that time onwards, jobs are not getting executed.. They stuck in idle. I am seeing this log in worker node StartLog file.Â

06/20/15 13:59:11 Detected hibernation states: S3,S4 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂÂ
06/20/15 13:59:11 "/usr/sbin/condor_starter -classad" did not produce any output, ignoring                                      Â
06/20/15 13:59:11 Failed to execute /usr/sbin/condor_starter.std, ignoring      ÂÂ
06/20/15 13:59:11 WARNING WARNING WARNING: No valid starters were found! Is something wrong with your Condor installation? This startd will not be able to run jobs.
06/20/15 13:59:11 No STARTD_HISTORY file specified in config file         ÂÂ
06/20/15 13:59:11 History file rotation is enabled. Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
06/20/15 13:59:11  Maximum history file size is: 20971520 bytes  ÂÂÂ

gfs1_config=/mnt/gfs1/files/workernode.config
gfs2_config=/mnt/gfs2/files/workernode.config

I externalized workernode configuration so that all worker nodes share same configuration. If I start with gfs1_config external configuration, jobs are executing file. but if I start withÂgfs2_configÂexternal configuration, jobs are getting stuck and print above message. Going mad on this.. :( not sure what is going wrong. I compared both configuration files. both areÂsame. I actually copied gfs1 configuration to gfs2 configuration. nothing is changed.Â

Anyone has any idea about this?



                                            Â