[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] DAG submission using Condor SOAP webservice

Thanks Todd for the suggestion but looking at the two pointer carefully our problem seems different. What we have is  we submit let say Job A using  SOAP and when Job A runs it calculates the dependencies required based on the input dataset  and tries to submit a DAG let say Job B.  From the outside java client that uses the SOAP API we don't know what the DAG dependencies are and cannot directly generate the DAG for Job B, so we have to submit the Vanilla Universe Job A to figure the dependencies out and submit a DAG job using not the SOAP api but directly calling "condor_submit_dag" when it runs.  The problem we are facing is a Vanilla universe job running successfully on a submitter machine is not able to submit a DAG. We can see that the DAG files get created correctly, it is only the "condor_submit_dag" that is failing. If we log to the submit machine we can manually submit the DAG file created by Job B successfully. 

We tried forcing condor to use a specific user all the time on the submit machines by adding " SLOT1_user = <UID domain user>" for a domain user that has its cred stored but still no luck.

-----Original Message-----
From: Todd Tannenbaum [mailto:tannenba@xxxxxxxxxxx] 
Sent: Wednesday, June 13, 2012 4:43 PM
To: Condor-Users Mail List
Cc: Belai Beshah
Subject: Re: [Condor-users] DAG submission using Condor SOAP webservice

After a quick read of the below, it sounds as if you are submitting DAGMan as a vanilla universe job and therefore having it run under the starter on an execute node. Typically DAGMan is run as a scheduler universe job under the schedd, no startd involved.

Essentially your soap submission should submit a scheduler universe job to a schedd, the executable should be dagman, and you should move all files (dagman input file, all the node submit files, and all the files referenced in those submit files) via the SOAP file staging calls.

Some additional pointers:

Also perhaps helpful is this URL, which talks about how to submit dagman into a grid universe (you'll want to use SOAP to the scheduler universe, but several of the concepts/ideas are the same):

Hope this helps,

On 6/13/2012 5:55 AM, Belai Beshah wrote:
> Hi All,
> We  have a simple out of the box Condor 7.6.7 on Windows 7 with the 
> SOAP Interface configured and SOAP READ/WRITE access given to 
> everybody on the network.  Submitting simple jobs from other machines 
> using SOAP is working great. However, we are facing problem when we 
> try to submit a job that tries to submit a DAG. What we have done is 
> to make the submitter machines advertise as having DAG capabilities 
> and the jobs that try to submit DAG jobs to require that so that these 
> jobs run only on the submitter machines.  Here is the part of " condor_config.local"
> for this setup:
> # Added to the submit machine to ONLY accept jobs that require DAG and 
> advertise this machine as having DAG
> The matching of Jobs to submit machines is happening correctly but 
> when the job starts to run it errors out with the message:
> ERROR: No credential stored for condor-reuse-slot1@PT-MASTER
> Correct this by running:
> condor_store_cred add
> ERROR: condor_submit failed; aborting
> According to the docs the  starter will assign a new randomly 
> generated password to the "condor-reuse-slot1"  account, so storing a credential
> by hand will not be a solution.   We are using the "run_as_owner" flag
> for the simple jobs. Is there a way to tell the DAG jobs to run as 
> owner without going to the extra step of generating the dag using 
> "condor_submit_dag -no_submit"  and somehow editing the resulting DAG 
> (which is very difficult since the machines submitting using SOAP 
> don't have access to the submitter machines file system).
> Thanks
> Belai
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx 
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/

Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
Condor Project Technical Lead          1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685