[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] remote job submission with condor, (newbie) question



On Jan 10, 2006, at 1:13 PM, Frank van Lingen wrote:

I am new to using condor and trying to submit a job to a condor site:

This is what I did:
-installed the vdt client (1.3.10)
-generated a proxy using voms-proxy-init
-did a "condor_submit  test.jdl" where the contents of test.jdl is:

universe = globus
Executable = /bin/date
globusscheduler = t2cms02.sdsc.edu/jobmanager-fork
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
Output  = /home/fvlingen/condor_test/output.out
Error = /home/fvlingen/condor_test/error.err
Log = /home/fvlingen/condor_test/user.log
queue

If I do not run the command "condor_schedd" I get an error, local
scheduler not found. If I run the "condor_schedd" command before
condor_submit
it works, but it seems to submit to the scheduler on my laptop (below
the error log output)
which is not what I specified in my jdl file:

000 (002.000.000) 01/10 10:34:34 Job submitted from host: <127.0.0.1:32771>
...
018 (002.000.000) 01/10 10:36:51 Globus job submission failed!
    Reason: 43 the job manager failed to stage the executable
...
012 (002.000.000) 01/10 10:36:58 Job was held.
Globus error 43: the job manager failed to stage the executable
        Code 2 Subcode 43


Using the -r option does not seem to help either:

condor_submit -r t2cms02.sdsc.edu/jobmanager-fork test.jdl
ERROR: Can't find address of schedd t2cms02.sdsc.edu/jobmanager-fork

I probably am doing something trivially wrong. I looked at the
condor_submit man pages
and examples of submitting a job, but they seem be based around local
submission it seems.

Where can I find information on remote job submission?

When you use Condor, you submit your jobs to a condor_schedd daemon, usually one running on your local machine. It will then forward the job to an appropriate destination (in this case, t2cms02.sdsc.edu/ jobmanager-fork). Your job is being held because the transfer of the job's executable from your machine to t2cms02.sdsc.edu is failing. This is usually a networking issue (incomplete hostname, firewall, etc).

Try running globus-gass-server and look at the URL it prints. Is the hostname incomplete? Is the port one that's being blocked by a firewall?

+--------------------------------+-----------------------------------+
|           Jaime Frey           | I used to be a heavy gambler.     |
|       jfrey@xxxxxxxxxxx        | But now I just make mental bets.  |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind.        |
+--------------------------------+-----------------------------------+