Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor w/o shared filesystem

Date: Wed, 20 Jul 2011 08:52:11 -0400
From: Matthew Farrellee <matt@xxxxxxxxxx>
Subject: Re: [Condor-users] condor w/o shared filesystem

On 07/19/2011 12:32 PM, Ian Chesal wrote:

On Monday, July 18, 2011 at 9:14 PM, Rita wrote:

One feature we would like to see is having condor jobs running
completely independent of the scheduler -- related to file systems.

Currently, the job log depends needs to be shared on the scheduler and
execution hosts.

This is not true. Only scheduler-side processes require access to the
job log. Perhaps I'm mis-understanding your request?

Condor will try to be smart about using shared filesystems if
FILESYSTEM_DOMAIN is the same between the scheduler and the execute
nodes. You can set this to $(FULL_HOSTNAME) on every machine to let
Condor know that none of your machines share a filesystem.

If the scheduler host dies for any reason the execution of the jobs halts.

Condor is highly fault tolerant and the disappearance of a scheduler can
be tolerated. The default behaviour can be changed if you don't like it.
Specifically you want to look at MAX_CLAIM_ALIVES_MISSED
(http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#17538)
and ALIVE_INTERVAL
(http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:AliveInterval)
-- these control how long a condor_startd will keep a job running while
the scheduler they came from is no longer able to connect to the startd.

It would be nice to place all files in spool/ (this includes,
output,error, and job (if possible)) . I am not sure if there is a
better way.

You can have this happen if you submit with the -remote option -- Condor
will spool everything in the spool directory and hold it there for you
until you fetch the files with condor_transfer_data.

Regards,
- Ian


If you're not using -remote/-name now, you likely only need -spool.

Best,


matt

Follow-Ups:
- Re: [Condor-users] condor w/o shared filesystem
  - From: Rita

References:
- [Condor-users] condor w/o shared filesystem
  - From: Rita
- Re: [Condor-users] condor w/o shared filesystem
  - From: Ian Chesal

Prev by Date: Re: [Condor-users] The /var/lib/condor/execute folder 1, 000, 000 question? :-)
Next by Date: Re: [Condor-users] The /var/lib/condor/execute folder 1, 000, 000 question? :-)
Previous by thread: Re: [Condor-users] condor w/o shared filesystem
Next by thread: Re: [Condor-users] condor w/o shared filesystem
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] condor w/o shared filesystem