[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Using Condor (Windows) with Linux File Servers



On 12/15/05, Ari Silver <ari_ag@xxxxxxxxx> wrote:
> --- Matt Hope <matthew.hope@xxxxxxxxx> wrote:
>
> > On 12/15/05, Ari Silver <ari_ag@xxxxxxxxx> wrote:
> > > Hello,
> > >
> > > I'm trying to get a Windows 2003 Server pool
> > running
> > > Condor to use linux file servers. The primary
> > reason
> > > for this is that the executables I wish to run
> > require
> > > a number of dlls and I don't want to have to
> > specify
> > > which dlls should be transferred each time a job
> > is
> > > run.
> > <snip>
> > > 007 (008.000.000) 12/15 10:02:58 Shadow exception!
> > >        Error from starter on
> > machine504.mydomain.com: Failed
> > > to open standard output file
> > > '//bonnie/home/silvera/CondorJobs/demo/demo.out':
> > > Invalid argument (errno 22)
> > >        0  -  Run Bytes Sent By Job
> > >        0  -  Run Bytes Received By Job
> > >
> > > I made sure the directories have the necessary
> > > permissions (777). Can anyone shed some light on
> > this?
> >
> > On windows jobs execute as a user with no
> > priviledges whatsoever. this
> > includes accessing network level resources such as
> > unc shares.

apologies - I misread you as saying you wanted to read/write to these
directories from the executing machines. Ignore the advice in previous
mail.

> I ran condor_store_cred add on the execute machines
> and I am still experiencing the same problem. How is
> Condor able to write to the .log file in the directory
> in question, but cannot write to the .out file in the
> same directory?

The out file (and err file) is written to at the end by copying back
the stderr and stdout redirected files. The log file is written to
repeatedly as the job is on going.
This can sometimes trigger some nasty behaviour with timeouts on sessions.

The job's userlog is written to in a very different maner than the
file transfer mechanism - is there anything special (Active directory
permissions, security dongles or somesuch) controlling access to the
network?

I *never* recommend to any of the users of my pool that they use a
location remote of their submission (and thence shadow machine) for
placing files being copied back from the excute machine.
If for some reason beyond condors control windows has an issue
accessing the network at the copy back point then the entire job will
restart.

Painful in the extreme and nigh on impossible to really debug - I
wouldn't go there if I were you.

Matt