[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Question about using Condor in a non-Grid environment

On Jul 20, 2012, at 9:56 AM, Sarnath K - ERS, HCLTech wrote:
> That's a good point (applicability of DAGMan). 
> However, if the Server Job is a shell script, the script could just start the server (say -- service myservice start) and then could exit.
> Condor would think that the shell script is done and hence will mark the job as complete. (Correct me here)
> After that, the client apps could be kick started.

This is an accurate assessment.  The problem with this method, as you note, is that while DAGMan can ensure the correct start order, it cannot ensure that the servers actually started correctly.  As far as DAGMan is concerned the jobs that you call "servers" have completed successfully.  If a server hangs or crashes then you won't be able to tell that from the resource manager -- which partially defeats the purpose of the resource manager.

The more I think about it the more I think that Jan really needs a HA cluster manager like Veritas Cluster Server.  HA cluster managers are all about managing services with resource dependencies.  It's what they do, and they do it much better than distributed and high-performance computing cluster managers can.

Rich Pieri <ratinox@xxxxxxx>
MIT Laboratory for Nuclear Science