[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Initial installation - I don't understand why the job cannot write its own file?



On Mon, Jul 31, 2006 at 08:17:41PM +0100, Atwood, Robert C wrote:
>  Hi,
> I have installed Condor on a small cluster on its own private network.
> The master has 2 interfaces (outside network , cluster network). I've
> got it configured so that jobs can be submitted and they run on the
> nodes, with minimal chages to the default configuration. 
> 
> However, there is a peculiar problem that I cannot figure out.
> 
> When the (vanilla) job starts, the output file is created, belongong to
> the submitting user, with permissions -rw-r--r--  Then the job gets
> held, with the log message: 
> 
>  "Error from starter on vm2@xxxxxxx : Failed to open
> '/home/myuser/q/loop.out' as standard output: Permission denied (errno
> 13)"
> 
> 
> The job runs as  'nobody',  but the file is created with ownership of
> the submitting user. This doesn't seem right. 
> 
> I tried altering the UID_DOMAIN to all different things that I could
> think of (domain of the master's outside, domain of the private network,
> * ) with no difference in this behaviour. 
> 
> I thought this file should be created in /local/condor/execute , where
> /local/condor is defined in the configuration file by LOCAL_DIR, not in
> the submitting working directory, anyways? That is what I would like, I
> thought that was the default for vanilla jobs? 
> 

Only if you're using file transfer with your job, which is not used unless
your job asks for it.

http://www.cs.wisc.edu/condor/manual/v6.8.0/2_5Submitting_Job.html#SECTION00354000000000000000


> 
> Any suggestions appreciated,
> 
> Robert
> 
> 
> loopit.c:
> #include <stdio.h>
> #include <unistd.h>
> int main (){
>   int i;
>   for(i=0;i<100;i++){
>   sleep(1);
>   printf("%i\n",i);
>   }
> }
> 
> 
> loop.submit:
> 
> ########################
> # Submit description file for loop program
> ########################
> Executable     = loopit
> Universe       = vanilla
> Output         = loop.out
> Log            = loop.log
> TARGET.FileSystemDomain = *

This 'TARGET' line doesn't do anything. TARGET.<whatever> only makes
sense on the right-hand-side of submit files. I think what you're really
trying to say is

requirements = TARGET.Filesystemdomain == *

which is less illegal than your first expression, but is also wrong. You
can't string-match like that in Classads without using one of the string
matching functions. (You can use regexes in classads now, but the 
documentation isn't done and I don't know how to do it yet either, so
I can't explain it)

But you don't actually want to do that either, because you really don't
want to say "use any filesystem domain" unless you're using file transfer
(you don't want to wind up on a machine that doesn't have your files!).
But if you're using file transfer, you don't need to say anything about 
filesystem domains at all, so I'd just forget about it. 

[In advanced usage, you can use 
should_transfer_files = if_needed
which means, "if you run on machine that matches my filesystem domain, don't
use file transfer, and if you run on machine that doesn't match my filesystem
domain, use file transfer. It's tricky to get right and until you've been 
at it a while it's best to avoid it and just always transfer files.]

-Erik