[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Not fully able to start jobs - permissions?



I'm having some weird problems where jobs aren't starting fully. The StarterLog file makes it look like it's trying to start but then chokes. Specifically, it seems to be looking for log files that it can write to. The directory it's looking for doesn't exist, but the directory one level up does have writing privileges (ie, that dir COULD be created if Condor wanted to do it). If I manually create the dir, Condor roars ahead and creates the logs and runs the job.

Now, what's interesting to me (and maybe should be a clue to me as to how to solve this problem) is that the same directory IS being created on the master server automatically. Is it possible that Condor assumes that this directory is network accessible and not per-machine? (I'm kinda grasping at straws here).

Cheers!


5/11 11:35:41 ******************************************************
5/11 11:35:41 ** condor_starter (CONDOR_STARTER) STARTING UP
5/11 11:35:41 ** /mnt/pike/gorn/Applications/condor-6.6.9-linux_x86_64/sbin/condor_starter
5/11 11:35:41 ** $CondorVersion: 6.6.9 Mar 10 2005 $
5/11 11:35:41 ** $CondorPlatform: I386-LINUX_RH9 $
5/11 11:35:41 ** PID = 25629
5/11 11:35:41 ******************************************************
5/11 11:35:41 Using config file: /mnt/condor/accounts/condor/condor_config
5/11 11:35:41 Using local config files: /mnt/condor/accounts/condor/hosts/loaner1/condor_config.local
5/11 11:35:41 DaemonCore: Command Socket at <216.94.116.106:33946>
5/11 11:35:41 Done setting resource limits
5/11 11:35:41 Starter communicating with condor_shadow <216.94.116.89:49266>
5/11 11:35:41 Submitting machine is "tamari.coredp.com"
5/11 11:35:41 Starting a VANILLA universe job with ID: 33.0
5/11 11:35:41 IWD: /var/adm/condor/spool/cluster33.proc0.subproc0
5/11 11:35:41 Failed to open standard output file '/var/adm/condor/spool/cluster33.proc0.subproc0/condor.42811141-0.0.out': No such file or directory (errno 2)
5/11 11:35:41 Output file: /var/adm/condor/spool/cluster33.proc0.subproc0/condor.42811141-0.0.out
5/11 11:35:41 Failed to open standard error file '/var/adm/condor/spool/cluster33.proc0.subproc0/condor.42811141-0.0.error': No such file or directory (errno 2)
5/11 11:35:41 Error file: /var/adm/condor/spool/cluster33.proc0.subproc0/condor.42811141-0.0.error
5/11 11:35:41 Failed to open some/all of the std files...
5/11 11:35:41 Aborting OsProc::StartJob.
5/11 11:35:41 Failed to start job, exiting
5/11 11:35:41 ShutdownFast all jobs.
5/11 11:35:41 **** condor_starter (condor_STARTER) EXITING WITH STATUS 0