[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Personal Condor halts job



Hi,
I just started learning how to install condor.

By base system is SL6.4
# uname -a
Linux reuse-stack05 2.6.32-358.23.2.el6.x86_64 #1 SMP Wed Oct 16 11:13:47 CDT 2013 x86_64 x86_64 x86_64 GNU/Linux

I have used yum to install condor 
$ condor_version 
$CondorVersion: 8.0.4 Oct 19 2013 BuildID: 189770 $
$CondorPlatform: x86_64_RedHat6 $


My local /etc/condor/condor_config.local was modified to point to itself :
CONDOR_HOST = 19x.12x.16x.5x
...
ALLOW_WRITE = $(FULL_HOSTNAME), $(IP_ADDRESS), 19x.12x.16x.5x
ALLOW_READ = *


In  /etc/condor/condor_config
I have enabled :
USE_CKPT_SERVER = True   

I have disabled my fire wall
# service iptables stop
iptables: Flushing firewall rules:                         [  OK  ]
iptables: Setting chains to policy ACCEPT: filter          [  OK  ]
iptables: Unloading modules:                               [  OK  ]


Next, I change to a non-root user balewski and created this  test job with this content:

$ cat first.job
cmd = /bin/cat
args = /proc/self/status
output = first.job.$(cluster).$(process).out
error = first.job.$(cluster).$(process).err
log = first.job.log
queue 2

And submitted it :
$ condor_submit first.job 
Submitting job(s)..
2 job(s) submitted to cluster 8.


After few seconds I see both jobs have been halted:

[balewski@reuse-stack05 condor3]$ condor_q

----------
-- Submitter: reuse-stack05 : <198.125.163.55:55169> : reuse-stack05
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
   8.0   balewski       11/10 16:08   0+00:00:00 H  0   0.0  cat /proc/self/sta
   8.1   balewski       11/10 16:08   0+00:00:00 H  0   0.0  cat /proc/self/sta

2 jobs; 0 completed, 0 removed, 0 idle, 0 running, 2 held, 0 suspended

For the following reason:

------
$ tail -f first.job.log 
	0  -  Run Bytes Received By Job
...
012 (008.000.000) 11/10 16:08:01 Job was held.
	Error from slot1@reuse-stack05: Failed to open '/home/balewski/condor3/first.job.8.0.out' as standard output: Permission denied (errno 13)
	Code 7 Subcode 13
...
012 (008.001.000) 11/10 16:08:01 Job was held.
	Error from slot2@reuse-stack05: Failed to open '/home/balewski/condor3/first.job.8.1.out' as standard output: Permission denied (errno 13)
	Code 7 Subcode 13
...

Can you please help me to identify & fix the cause of my problem?
Thanks
Jan