[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] RE : RE : birdbath dag submission



Allright,


Here is the dagman.log file written after a DAG submission from my Java code:
----------------------------------------------------------------------------------------------
001 (439.000.000) 09/25 09:27:34 Job executing on host: <127.0.0.1:50817>
...
006 (439.000.000) 09/25 09:27:42 Image size of job updated: 7272
...
005 (439.000.000) 09/25 09:28:18 Job terminated.
        (1) Normal termination (return value 1)
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        70265  -  Run Bytes Sent By Job
        4207584  -  Run Bytes Received By Job
        70265  -  Total Bytes Sent By Job
        4207584  -  Total Bytes Received By Job
...
----------------------------------------------------------------------------------------------

Here is the dagman.log file written after a DAG submission from the command condor_submit_dag:
----------------------------------------------------------------------------------------------
000 (441.000.000) 09/25 09:30:40 Job submitted from host: <127.0.0.1:8181>
...
001 (441.000.000) 09/25 09:30:40 Job executing on host: <127.0.0.1:8181>
...
005 (441.000.000) 09/25 09:31:26 Job terminated.
        (1) Normal termination (return value 0)
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
        0  -  Total Bytes Sent By Job
        0  -  Total Bytes Received By Job
...
----------------------------------------------------------------------------------------------

Sounds like it's not executting on the 8181 port !!! but I don't understant as I access to the Schedd trought : http://localhost:8181
Does it make sens ??






















-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx on behalf of Matthew Farrellee
Sent: Tue 9/25/2007 6:15 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] RE :  RE :  birdbath dag submission
 
(inline)

Mariette, Jerome wrote:
> So I added all the files involved even the 2 executable and I still get the same error.
> To make it simple I just write what I'm doing with a cp of a file ... what is the only job in my DAGfile. I still have the same error.
> 
> Here is the DAG: cpWorkflow.dag
>   JOB COPY /home/jerome/job.cp.condor
>   VARS COPY executable="/bin/cp"
>   VARS COPY inputfile="/home/jerome/file.src"
>   VARS COPY outputfile="/home/jerome/file.copie"
> 
> Then here is the job.cp.condor file:
>   Universe = vanilla
>   executable = $(executable)
>   transfer_executable = False
>   should_transfer_files = NO
>   Notification = Error
>   arguments = $(inputfile) $(outputfile)
>   output = job.cp.out
>   error = job.cp.err
>   log = job.cp.log
>   queue

This all looks good.


> And my Java code:
>   
>   Schedd schedd = new Schedd (new URL("http://localhost:8181";));
>   Transaction xact = schedd.createTransaction();
>   xact.begin(30);
>   int cluster = xact.createCluster();
>   int job = xact.createJob(cluster);
>   File?? files = { new File("/home/jerome/cpWorkflow.dag"),
>                     new File("/home/jerome/job.cp.condor"),
>                     new File("/bin/cp"),
>                     new File("/home/jerome/file.src")};
>   
>      xact.submit(cluster, job, "jerome", UniverseType.SCHEDULER,
>                  "/opt/condor-6.8.5/bin/condor_dagman",
>                  "-f -l . -Debug 3 " +
>                  "-Lockfile myLockFile -Dag myDag -Rescue myRescuDag 
>    -Condorlog myLog",
>                  null, null, files);
>   
>    xact.commit();

This looks good too.


> The exemple is so easy it should work, what am I missing ?
> I know you said to avoid the /path/to ... but not sure what do you mean, is it for exemple better to create juste a file like that:
>   File?? files = { new File("/home/jerome/*"),
>                     new File("/bin/cp")};

Ignore that. I thought maybe your job.cp.condor was referencing the 
input/output with full paths, which wouldn't work since Condor puts 
everything in a single directory when you transfer it.

I have a feeling condor_dagman wants to put some special attributes in 
the Job ad, maybe something related to a log file. You should try two 
things: 1) look at the .sub file that condor_submit_dag is creating and 
look for any values you might not expect in a regular submit file, i.e. 
dagman_something; 2) look at the job you successfully submitted with 
condor_submit_dag (use condor_q -long) and look for dagman specific 
attributes. Once you've done one or both of those you'll probably find 
something that you need to add to the job ad you are submitting, I 
believe it's the "extraAttrs" argument to xact.submit() (2nd to last arg?).

Sorry about this, but Condor uses numerous attributes that are not 
normally exposed to users.

Best,


matt


> thx,
> Jerome
> 
> PS: I tryed the condor_submit_dag, this is working perfectly ... the only difference is the condor_dagman process run stright away after the submission when using the condor_submit command, but when using my Java code, the condor_dagman is in idle, so I have to submit a totaly differnet process (using condor_submit) to make it runing. But when the condor_dagman is runing never the sub job is printed !! (when the COPY job is printed using the condor_submit_dag commande!!)
> 
> 
> 
> 
> 
> 
> 
> -------- Message d'origine--------
> De: condor-users-bounces@xxxxxxxxxxx de la part de Matthew Farrellee
> Date: lun. 24/09/2007 19:46
> À: Condor-Users Mail List
> Objet : Re: [Condor-users] RE :  birdbath dag submission
>  
> condor_dagman is just a program that reads your DAG and runs the jobs 
> specified in it. It runs them by submitting them to Condor, and it uses 
> condor_submit to do that. That means you need to give condor_dagman 
> access to the submit files so it can hand them off to condor_submit.
> 
> You'll want to send execjob2 too, and you should try it all without 
> using "path/to/" -- put everything into a single directory, Condor likes 
> that. Also, make sure your dag runs if you submit it with 
> condor_submit_dag...
> 
> 
> matt
> 
> Mariette, Jerome wrote:
>> well my dagfile looks like that:
>>
>>   JOB JOB1 /path/to/job.job1.condor
>>   JOB JOB2 /path/to/job.job2.condor
>>
>>   VARS JOB1 executable="/path/to/exejob1"
>>   VARS JOB1 input="path/to/inputjob1"
>>   VARS JOB1 output="path/to/outputjob1"
>>   VARS JOB2 executable="/path/to/exejob2"
>>   VARS JOB2 input="path/to/outputjob1"
>>
>>   PARENT JOB1 Child JOB2
>>
>> so in order to send files, I added the following lines:
>>
>>>   Schedd schedd = new Schedd (new URL("http://localhost:8181";));
>>>   Transaction xact = schedd.createTransaction();
>>>   xact.begin(30);
>>>   int cluster = xact.createCluster();
>>>   int job = xact.createJob(cluster);
>> File?? files = { new File("/path/to/DAGfile"),
>>                   new File("/path/to/job.job1.condor"),
>>                   new File("/path/to/job.job2.condor"),
>>                   new File("/path/to/inputjob1")};
>>
>>    xact.submit(cluster, job, "jerome", UniverseType.SCHEDULER,
>>                "/opt/condor-6.8.5/bin/condor_dagman", /* Path to the 
>>  dagman binarie */
>>                "-f -l . -Debug 3 " +
>>                "-Lockfile myLockFile -Dag myDag -Rescue myRescuDag 
>>  -Condorlog myLog",
>>                null, null, files);
>>
>>>   xact.commit();
>>
>> I still have the same error:
>>   failed while reading from pipe
>>   ... 
>>   ERROR: failed to initialize condor job log
>>
>>
>> Moreover, I was wondering why I do have to send a job and sometime more than one to make condor begin to process my jobs ?
>> thx for your help,
>>
>> Jerome
>>
>>
>>
>>
>>
>>   
>>
>>
>> -------- Message d'origine--------
>> De: condor-users-bounces@xxxxxxxxxxx de la part de Matthew Farrellee
>> Date: lun. 24/09/2007 17:02
>> À: Condor-Users Mail List
>> Objet : Re: [Condor-users] birdbath dag submission
>>  
>> This looks pretty good. Are there any files you might need to submit 
>> along with the dag? You probably need to send along any condor_submit 
>> file that is used for a node in the dag. That way condor_dagman knows 
>> what to submit for each step in the dag.
>>
>>
>> matt
>>
>> Mariette, Jerome wrote:
>>> Hi everbody,
>>> I'm pretty new in Condor world and have some troubles submitting dag.
>>> Here is my probleme.
>>> I'm using birdbath wraper to do it and I'm submitting the dag file like 
>>> that:
>>>
>>>   Schedd schedd = new Schedd (new URL("http://localhost:8181";));
>>>   Transaction xact = schedd.createTransaction();
>>>   xact.begin(30);
>>>   int cluster = xact.createCluster();
>>>   int job = xact.createJob(cluster);
>>>   xact.submit(cluster, job, "jerome", UniverseType.SCHEDULER,
>>>               "/opt/condor-6.8.5/bin/condor_dagman", /* Path to the 
>>> dagman binarie */
>>>               "-f -l . -Debug 3 " +
>>>               "-Lockfile myLockFile -Dag myDag -Rescue myRescuDag 
>>> -Condorlog myLog",
>>>               null, null, null);
>>>   xact.commit();
>>>
>>> what am I doing wrong ? (the Dag File is ok because tryed by command 
>>> it's working)
>>> thx
>>>
>>> Jerome
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Condor-users mailing list
>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>
>>> The archives can be found at: 
>>> https://lists.cs.wisc.edu/archive/condor-users/
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at: 
>> https://lists.cs.wisc.edu/archive/condor-users/
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at: 
>> https://lists.cs.wisc.edu/archive/condor-users/
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/

<<winmail.dat>>