[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Shutdown kills jobs




I have a linux cluster which has submit and execute capabilities. The nead node is also the manager. All nodes in the cluster share a common NFS partition. My intent is to send jobs to the cluster from submit only hosts, and have them run to completion on the cluster. This works fine in Condor, execept when I want to shutdown the pc which submitted the job. Whenever I shutdown the submitting pc, the jobs are evicted and remain in the queue until the submitting pc is back online. Is there a way to have the job run without depending on the submitting pc after it has been queued?

Condor requires that the submitting machine be active while the job runs.

In Condor 6.7.0, we added a feature that you can be disconnected for a while, but eventually you need to reconnect. You can't be permanently disconnected.

If you want to be able to submit jobs from, say, a laptop, then you need to have another computer that will be on the whole time. It will submit jobs and you just log into it. But the computer that runs the "condor_schedd" process will need to be on while the job is running. It (and it's child processes, the shadows) take your job under their wings and keep an eye on them. They love your jobs. But in order to do their job, they must be running.

-alain