[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problem to submit example jobs



Hi all

I started with the example jobs, just to check if my condor was fine... and it's not... or not really : 
If I "condor_submit" jobs as user "condor", everything if fine, but when I try to submit jobs with my username, it just doesn't work : 

guiot@chagall:~/tmp/TestCondor$ condor_submit env.cmd
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 40.

WARNING: File /ibpc/chagall/guiot/tmp/TestCondor/env.out is not writable by condor.

WARNING: File /ibpc/chagall/guiot/tmp/TestCondor/env.err is not writable by condor.
guiot@chagall:~/tmp/TestCondor$

I checked the rights : it seems to be fine since both user guiot and condor (end everyone) can write in this actual directory : (this was done before the submit)
guiot@chagall:~/tmp/TestCondor$ ll

-rw-r--r--  1 guiot  users      816 Oct 12 11:53 Makefile
drwxr-xr-x  2 guiot  users     4096 Oct 12 11:54 PVM
-rw-r--r--  1 guiot  users    13190 Oct 12 11:53 README
drwxr-xr-x  2 guiot  users     4096 Oct 12 11:54 dagman
-rw-r--r--  1 guiot  users     3210 Oct 12 11:53 env.C
-rw-r--r--  1 guiot  users      296 Oct 12 12:00 env.cmd
-rwxr-xr-x  1 guiot  users 12422015 Oct 12 12:10 env.remote
-rwxr-xr-x  1 guiot  users      205 Oct 12 11:54 submit
-rw-r--r--  1 condor users    16384 Oct 12 14:34 tmp
guiot@chagall:~/tmp/TestCondor$ ll ../
drwxrwxrwx  4 guiot   users      4096 Oct 12 14:55 TestCondor

The weird thing is that it _does_ create the .out and .err files : (this was done just after the submit) : 
guiot@chagall:~/tmp/TestCondor$ ll

-rw-r--r--  1 guiot  users      816 Oct 12 11:53 Makefile
drwxr-xr-x  2 guiot  users     4096 Oct 12 11:54 PVM
-rw-r--r--  1 guiot  users    13190 Oct 12 11:53 README
drwxr-xr-x  2 guiot  users     4096 Oct 12 11:54 dagman
-rw-r--r--  1 guiot  users     3210 Oct 12 11:53 env.C
-rw-r--r--  1 guiot  users      296 Oct 12 12:00 env.cmd
-rw-r--r--  1 guiot  users        0 Oct 12 14:58 env.err
-rw-r--r--  1 guiot  users       83 Oct 12 14:58 env.log
-rw-r--r--  1 guiot  users        0 Oct 12 14:58 env.out
-rwxr-xr-x  1 guiot  users 12422015 Oct 12 12:10 env.remote
-rwxr-xr-x  1 guiot  users      205 Oct 12 11:54 submit
-rw-r--r--  1 condor users    16384 Oct 12 14:34 tmp
guiot@chagall:~/tmp/TestCondor$       


Here is my ScheddLog, from the moment I "condor_submit" the job. The break is when I run condor_rm.

10/12 14:51:16 (pid:4829) DaemonCore: Command received via UDP from host <193.49.27.24:49460>
10/12 14:51:16 (pid:4829) DaemonCore: received command 421 (RESCHEDULE), calling handler (reschedule_negotiator)
10/12 14:51:16 (pid:4829) Sent ad to central manager for guiot@xxxxxxxxxxxxxx
10/12 14:51:16 (pid:4829) Sent ad to 1 collectors for guiot@xxxxxxxxxxxxxx
10/12 14:51:16 (pid:4829) Called reschedule_negotiator()
10/12 14:51:16 (pid:4829) Activity on stashed negotiator socket
10/12 14:51:16 (pid:4829) Negotiating for owner: guiot@xxxxxxxxxxxxxx
10/12 14:51:16 (pid:4829) Checking consistency running and runnable jobs
10/12 14:51:16 (pid:4829) Tables are consistent
10/12 14:51:16 (pid:4829) Out of jobs - 1 jobs matched, 0 jobs idle, flock level = 0
10/12 14:51:20 (pid:4829) Starting add_shadow_birthdate(40.0)
10/12 14:51:20 (pid:4829) Started shadow for job 40.0 on "<193.49.27.11:33430>", (shadow pid = 11193)
10/12 14:51:20 (pid:4829) Shadow pid 11193 for job 40.0 exited with status 4
10/12 14:51:20 (pid:4829) ERROR: Shadow exited with job exception code!
10/12 14:51:21 (pid:4829) Sent ad to central manager for guiot@xxxxxxxxxxxxxx
10/12 14:51:21 (pid:4829) Sent ad to 1 collectors for guiot@xxxxxxxxxxxxxx
10/12 14:51:23 (pid:4829) Starting add_shadow_birthdate(40.0)
10/12 14:51:24 (pid:4829) Started shadow for job 40.0 on "<193.49.27.11:33430>", (shadow pid = 11194)
10/12 14:51:24 (pid:4829) Shadow pid 11194 for job 40.0 exited with status 4
10/12 14:51:24 (pid:4829) ERROR: Shadow exited with job exception code!
10/12 14:51:26 (pid:4829) Sent ad to central manager for guiot@xxxxxxxxxxxxxx
10/12 14:51:26 (pid:4829) Sent ad to 1 collectors for guiot@xxxxxxxxxxxxxx
10/12 14:51:26 (pid:4829) Starting add_shadow_birthdate(40.0)
10/12 14:51:26 (pid:4829) Started shadow for job 40.0 on "<193.49.27.11:33430>", (shadow pid = 11196)
10/12 14:51:26 (pid:4829) Shadow pid 11196 for job 40.0 exited with status 4
10/12 14:51:26 (pid:4829) ERROR: Shadow exited with job exception code!
10/12 14:51:28 (pid:4829) Starting add_shadow_birthdate(40.0)
10/12 14:51:28 (pid:4829) Started shadow for job 40.0 on "<193.49.27.11:33430>", (shadow pid = 11197)
10/12 14:51:28 (pid:4829) Shadow pid 11197 for job 40.0 exited with status 4
10/12 14:51:28 (pid:4829) ERROR: Shadow exited with job exception code!
10/12 14:51:30 (pid:4829) Starting add_shadow_birthdate(40.0)
10/12 14:51:30 (pid:4829) Started shadow for job 40.0 on "<193.49.27.11:33430>", (shadow pid = 11200)
10/12 14:51:30 (pid:4829) Shadow pid 11200 for job 40.0 exited with status 4
10/12 14:51:30 (pid:4829) ERROR: Shadow exited with job exception code!
10/12 14:51:30 (pid:4829) Match for cluster 40 has had 5 shadow exceptions, relinquishing.
10/12 14:51:30 (pid:4829) Sent RELEASE_CLAIM to startd on <193.49.27.11:33430>
10/12 14:51:30 (pid:4829) Match record (<193.49.27.11:33430>, 40, 0) deleted
10/12 14:51:30 (pid:4829) DaemonCore: Command received via TCP from host <193.49.27.11:33607>
10/12 14:51:30 (pid:4829) DaemonCore: received command 443 (VACATE_SERVICE), calling handler (vacate_service)
10/12 14:51:30 (pid:4829) Got VACATE_SERVICE from <193.49.27.11:33607>
10/12 14:51:31 (pid:4829) Sent ad to central manager for guiot@xxxxxxxxxxxxxx
10/12 14:51:31 (pid:4829) Sent ad to 1 collectors for guiot@xxxxxxxxxxxxxx


10/12 14:51:50 (pid:4829) DaemonCore: Command received via TCP from host <193.49.27.24:51440>
10/12 14:51:50 (pid:4829) DaemonCore: received command 478 (ACT_ON_JOBS), calling handler (actOnJobs)
10/12 14:51:50 (pid:4829) UserLog::initialize: open("/ibpc/chagall/guiot/tmp/TestCondor/env.log") failed - errno 13 (Permission denied)
10/12 14:51:50 (pid:4829) WARNING: Invalid user log file specified: /ibpc/chagall/guiot/tmp/TestCondor/env.log
    
Any help would be greatly appreciated..
Nicolas GUIOT

-----------------------------------------------
CNRS - UPR 9080 : Laboratoire de Biochimie Theorique
Institut de Biologie Physico-Chimique
13 rue Pierre et Marie Curie
75005 PARIS - FRANCE

Tel : +33 158 41 51 70
Fax : +33 158 41 50 26
------------------------------------------------