[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] DAG submission using Condor SOAP webservice



After a quick read of the below, it sounds as if you are submitting DAGMan as a vanilla universe job and therefore having it run under the starter on an execute node. Typically DAGMan is run as a scheduler universe job under the schedd, no startd involved.

Essentially your soap submission should submit a scheduler universe job to a schedd, the executable should be dagman, and you should move all files (dagman input file, all the node submit files, and all the files referenced in those submit files) via the SOAP file staging calls.

Some additional pointers:

http://spinningmatt.wordpress.com/2009/11/02/submitting-a-workflow-to-condor-via-soap-using-java/

Also perhaps helpful is this URL, which talks about how to submit dagman into a grid universe (you'll want to use SOAP to the scheduler universe, but several of the concepts/ideas are the same):
   https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=DagManUnderCondorc

Hope this helps,
Todd

On 6/13/2012 5:55 AM, Belai Beshah wrote:
Hi All,

We  have a simple out of the box Condor 7.6.7 on Windows 7 with the SOAP
Interface configured and SOAP READ/WRITE access given to everybody on
the network.  Submitting simple jobs from other machines using SOAP is
working great. However, we are facing problem when we try to submit a
job that tries to submit a DAG. What we have done is to make the
submitter machines advertise as having DAG capabilities and the jobs
that try to submit DAG jobs to require that so that these jobs run only
on the submitter machines.  Here is the part of “ condor_config.local”
for this setup:

# Added to the submit machine to ONLY accept jobs that require DAG and
advertise this machine as having DAG

HAS_CONDOR_DAG = True

STARTD_ATTRS = HAS_CONDOR_DAG, $(STARTD_ATTRS)

START = (JOB_GROUP =?= "REQ_CONDOR_DAG") && $(START)

The matching of Jobs to submit machines is happening correctly but when
the job starts to run it errors out with the message:

ERROR: No credential stored for condor-reuse-slot1@PT-MASTER

Correct this by running:

condor_store_cred add

ERROR: condor_submit failed; aborting

According to the docs the  starter will assign a new randomly generated
password to the “condor-reuse-slot1”  account, so storing a credential
by hand will not be a solution.   We are using the “run_as_owner” flag
for the simple jobs. Is there a way to tell the DAG jobs to run as owner
without going to the extra step of generating the dag using
“condor_submit_dag -no_submit”  and somehow editing the resulting DAG
(which is very difficult since the machines submitting using SOAP don’t
have access to the submitter machines file system).

Thanks

Belai



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
Condor Project Technical Lead          1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685