[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor w/o shared filesystem



sorry if i wasn't clear before.

I seen a post where you can stream your output and error. If you have streaming enabled and the scheduler host reboots the jobs restart from scratch. I would like to have steaming and have scheduler die without my jobs restarting from scratch. Is this possible? 

Also for the -remote -name -spool what does 'name' mean? is it a scheduler name? 



On Wed, Jul 20, 2011 at 8:52 AM, Matthew Farrellee <matt@xxxxxxxxxx> wrote:
On 07/19/2011 12:32 PM, Ian Chesal wrote:
On Monday, July 18, 2011 at 9:14 PM, Rita wrote:
One feature we would like to see is having condor jobs running
completely independent of the scheduler -- related to file systems.

Currently, the job log depends needs to be shared on the scheduler and
execution hosts.
This is not true. Only scheduler-side processes require access to the
job log. Perhaps I'm mis-understanding your request?

Condor will try to be smart about using shared filesystems if
FILESYSTEM_DOMAIN is the same between the scheduler and the execute
nodes. You can set this to $(FULL_HOSTNAME) on every machine to let
Condor know that none of your machines share a filesystem.
If the scheduler host dies for any reason the execution of the jobs halts.
Condor is highly fault tolerant and the disappearance of a scheduler can
be tolerated. The default behaviour can be changed if you don't like it.
Specifically you want to look at MAX_CLAIM_ALIVES_MISSED
(http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#17538)
and ALIVE_INTERVAL
(http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:AliveInterval)
-- these control how long a condor_startd will keep a job running while
the scheduler they came from is no longer able to connect to the startd.

It would be nice to place all files in spool/ (this includes,
output,error, and job (if possible)) . I am not sure if there is a
better way.
You can have this happen if you submit with the -remote option -- Condor
will spool everything in the spool directory and hold it there for you
until you fetch the files with condor_transfer_data.

Regards,
- Ian

If you're not using -remote/-name now, you likely only need -spool.

Best,


matt
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxedu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



--
--- Get your facts first, then you can distort them as you please.--