[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_master.exe does not start on windows machine reboot



Hi Klaus: 

>>> 
In some machines, when Condor Service tryes to start 
the condor_master daemon, it fails to find the 
configuration files in the shared filesystem. 
<<<

Do you know for sure if the files available to the 
master when it goes to fetch them?  That is, is the
shared fs available when the Condor service starts
or does it depend on another service to be started
locally?

>>>
I have renamed the service command name to run a
condor_master.bat that waits for 15 secs and then
runs condor_master.exe. 
<<<

Is it always the case that 15 seconds is enough time
for the shared fs to become available?  You could
also have the service control manager restart the
service after the first failure (via the recovery tab
in the services management console extension).  

>>>
Condor service starts but the jobs submitted from
this machine run only locally and does not distribute
jobs to the other pool machines. 
Cold someone help me with this? 
<<<

You need a central manager to start distributing jobs.
Do all Condor instances pull from the same configuration
file?  If so you will need one to have a slightly different 
configuration so that it can be the pool's CM.

>>>
Is there any other way to delay the Condor service start
until the shared filesystems are available? 
<<<

Sure, there are a few ways to do this.  If the shared fs
relies on some service running, you could make the Condor
service list it as a dependency (i.e. the other service
must start before Condor will).  Or make sure that when 
the Condor service fails to start, that it waits for some
period of time.  Or, as you have tried with the batch file,
you could run a loop in the batch file and periodically 
check for the existence of the configuration files.  Once
it finds it, then start the master running.

Hope some of that helps.

Regards,
-B