[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[condor-users] condor_submit -r not working in 6.6.0



Here is something else that no longer works since upgrading to 6.6.0.

I used to be able to use...

    condor_submit -r condor <job_file>

to submit jobs.

Long story short, the only machine that is outside of our firewall and
able to flock jobs to other Condor pools is "condor".  Users need to
be able to submit jobs from machines inside of our firewall.

I am using "FS_REMOTE" for authentication.  This all worked fine in
6.4.7, but seems horribly broken in 6.6.0.

First symptom: condor_submit has the following diagnostic:

    $ condor_submit -r condor condor_job 
    Submitting job(s).
    Logging submit event(s).
    1 job(s) submitted to cluster 1539.
    Spooling data files for 1 jobs...

Why is it spooling data files?  It never did this before and there is
no need to spool files.  The only file that is being spooled is the
log file specified with the "Log =" parameter in the job file.

In the SchedLog I see the following:

    2/6 15:49:52 Started shadow for job 1539.0 on "<192.168.0.57:33435>", (shadow pid = 14043)
    2/6 15:49:52 ERROR: Shadow exited with job exception code!
    2/6 15:49:53 Started shadow for job 1539.0 on "<192.168.0.57:33435>", (shadow pid = 14044)
    2/6 15:49:53 ERROR: Shadow exited with job exception code!
    2/6 15:49:54 Started shadow for job 1539.0 on "<192.168.0.57:33435>", (shadow pid = 14045)
    2/6 15:49:54 ERROR: Shadow exited with job exception code!
    2/6 15:49:54 Started shadow for job 1539.0 on "<192.168.0.57:33435>", (shadow pid = 14046)
    2/6 15:49:54 ERROR: Shadow exited with job exception code!
    2/6 15:49:55 Started shadow for job 1539.0 on "<192.168.0.57:33435>", (shadow pid = 14047)
    2/6 15:49:55 ERROR: Shadow exited with job exception code!
    2/6 15:49:55 Match for cluster 1539 has had 5 shadow exceptions, relinquishing.
    2/6 15:49:55 Sent RELEASE_CLAIM to startd on <192.168.0.57:33435>
    2/6 15:49:55 Match record (<192.168.0.57:33435>, 1539, 0) deleted

And In the ShadowLog I see this:

    2/6 15:49:52 (?.?) (14043):******* Standard Shadow starting up *******
    2/6 15:49:52 (?.?) (14043):** $CondorVersion: 6.6.0 Nov 13 2003 $
    2/6 15:49:52 (?.?) (14043):** $CondorPlatform: INTEL-LINUX-GLIBC22 $
    2/6 15:49:52 (?.?) (14043):*******************************************
    2/6 15:49:52 (?.?) (14043):uid=0, euid=1152, gid=1, egid=101
    2/6 15:49:52 (?.?) (14043):RemoveNewShadowDroppings(): Old shadow removed new shadow ckpt directory: /home/condor/LINUX/hosts/condor/spool/cluster1539.proc0.subproc0
    2/6 15:49:52 (?.?) (14043):RemoveNewShadowDroppings(): Old shadow removed new shadow ckpt directory: /home/condor/LINUX/hosts/condor/spool/cluster1539.proc0.subproc0.tmp
    2/6 15:49:52 (?.?) (14043):Hostname = "<192.168.0.57:33435>", Job = 1539.0
    2/6 15:49:52 (1539.0) (14043):Requesting Primary Starter
    2/6 15:49:52 (1539.0) (14043):Shadow: Request to run a job was ACCEPTED
    2/6 15:49:52 (1539.0) (14043):Shadow: RSC_SOCK connected, fd = 17
    2/6 15:49:52 (1539.0) (14043):Shadow: CLIENT_LOG connected, fd = 18
    2/6 15:49:52 (1539.0) (14043):UserLog::initialize: open("/home/condor/LINUX/hosts/condor/spool/cluster1539.proc0.subproc0/opendir.log") failed - errno 2 (No such file or directory)
    2/6 15:49:52 (1539.0) (14043):ERROR "Failed to initialize user log!
    " at line 104 in file log_events.C
    2/6 15:49:52 (1539.0) (14043):Shadow: DoCleanup: unlinking TmpCkpt '/home/condor/LINUX/hosts/condor/spool/cluster1539.proc0.subproc0.tmp'
    2/6 15:49:52 (1539.0) (14043):Trying to unlink /home/condor/LINUX/hosts/condor/spool/cluster1539.proc0.subproc0.tmp

And this is repeated 5 times excepting that the two lines referring to
"RemoveNewShadowDroppings()" do not occur again.

Any ideas as to what is going on?

-- 
Daniel K. Forrest	Laboratory for Molecular and
forrest@xxxxxxxxxxxxx	Computational Genomics
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>