[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] DAGMan ? RE : birdbath dag submission



Thanks for your help,
so to fixe that I created a symbolic link from the directory it's looking into and the directory where the condor_config is! and the log file shows that the job is sent with the good port .. so I don't think this is the problem anymore! 

9/25 11:13:14 failed while reading from pipe.
9/25 11:13:14 Read so far:
9/25 11:13:14 ERROR: submit attempt failed
9/25 11:13:14 submit command was: condor_submit -a dag_node_name' '=' 'COPY -a +DAGManJobId' '=' '-1 -a DAGManJobId' '=' '-1 -a submit_event_notes' '=' 'DAG' 'Node:' 'COPY -a executable' '=' '/bin/cp -a inputfile' '=' '/home/jerome/file.src -a outputfile' '=' '/home/jerome/file.copie -a +DAGParentNodeNames' '=' '"" /home/jerome/job.cp.condor
9/25 11:13:14 Job submit try 3/6 failed, will try again in >= 4 seconds.
9/25 11:13:14 ERROR: failed to initialize condor job log -- ignore unless error repeats

the -1 is not normal as it's a normal number when runing throught the condor_submit_dag!
I checked the number I give into my Java code and it's the good one! maybe I have to add this number as well as parameter to the condor_dagman ??

let me know,
thx

Jerome





-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx on behalf of Matthew Farrellee
Sent: Wed 9/26/2007 8:35 AM
To: Condor-Users Mail List
Subject: [Condor-users]   DAGMan ? RE :  birdbath dag submission
 
Someone with more DAGMan experience should take a look at this. DAGMan
makes some assumptions about its operating environment that I'm no
longer familiar with.

You can pass environment variables to DAGMan by using the Env 
(Environment? use condor_q -long to find the right now) attribute on 
your Job.


matt

Mariette, Jerome wrote:
> I guess I got the problem, when I deleted the
> /etc/condor/condor_config, the file log file writes: Job executing on
> host: <127.0.0.1:8181>
> 
> what sounds much better to me ... but the job cannot be process still
> because it cannot find the condor_config file! The CONDOR_CONFIG
> environment variable is well set on the machine so I don't understant
> why when I'm using my Java code it can't find it !!
> 
> there is a way to say to the dagman to use the environmental
> variables ? or just to set one ?
> 
> let me know, Jerome
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -----Original Message----- From: condor-users-bounces@xxxxxxxxxxx on
> behalf of Mariette, Jerome Sent: Tue 9/25/2007 2:24 PM To:
> Condor-Users Mail List; Condor-Users Mail List Subject: Re:
> [Condor-users] RE :  RE :  birdbath dag submission
> 
> 
> Also: find an other big difference in the dagman.out files: 9/25
> 14:17:05 Using config source: /etc/condor/condor_config # 9/25
> 11:16:24 Using config source: /opt/condor-6.8.5/etc/condor_config
> 
> how can I specify the config file I want to use ?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -----Original Message----- From: condor-users-bounces@xxxxxxxxxxx on
> behalf of Mariette, Jerome Sent: Tue 9/25/2007 10:09 AM To:
> Condor-Users Mail List Subject: RE: [Condor-users] RE :  RE :
> birdbath dag submission
> 
> 
> Allright,
> 
> 
> Here is the dagman.log file written after a DAG submission from my
> Java code: 
> ----------------------------------------------------------------------------------------------
>  001 (439.000.000) 09/25 09:27:34 Job executing on host:
> <127.0.0.1:50817> ... 006 (439.000.000) 09/25 09:27:42 Image size of
> job updated: 7272 ... 005 (439.000.000) 09/25 09:28:18 Job
> terminated. (1) Normal termination (return value 1) Usr 0 00:00:00,
> Sys 0 00:00:00  -  Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00  -
> Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
>  Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage 70265  -  Run
> Bytes Sent By Job 4207584  -  Run Bytes Received By Job 70265  -
> Total Bytes Sent By Job 4207584  -  Total Bytes Received By Job ... 
> ----------------------------------------------------------------------------------------------
> 
> 
> Here is the dagman.log file written after a DAG submission from the
> command condor_submit_dag: 
> ----------------------------------------------------------------------------------------------
>  000 (441.000.000) 09/25 09:30:40 Job submitted from host:
> <127.0.0.1:8181> ... 001 (441.000.000) 09/25 09:30:40 Job executing
> on host: <127.0.0.1:8181> ... 005 (441.000.000) 09/25 09:31:26 Job
> terminated. (1) Normal termination (return value 0) Usr 0 00:00:00,
> Sys 0 00:00:00  -  Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00  -
> Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
>  Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage 0  -  Run Bytes
> Sent By Job 0  -  Run Bytes Received By Job 0  -  Total Bytes Sent By
> Job 0  -  Total Bytes Received By Job ... 
> ----------------------------------------------------------------------------------------------
> 
> 
> Sounds like it's not executting on the 8181 port !!! but I don't
> understant as I access to the Schedd trought : http://localhost:8181 
> Does it make sens ??
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -----Original Message----- From: condor-users-bounces@xxxxxxxxxxx on
> behalf of Matthew Farrellee Sent: Tue 9/25/2007 6:15 AM To:
> Condor-Users Mail List Subject: Re: [Condor-users] RE :  RE :
> birdbath dag submission
> 
> (inline)
> 
> Mariette, Jerome wrote:
>> So I added all the files involved even the 2 executable and I still
>> get the same error. To make it simple I just write what I'm doing
>> with a cp of a file ... what is the only job in my DAGfile. I still
>> have the same error.
>> 
>> Here is the DAG: cpWorkflow.dag JOB COPY /home/jerome/job.cp.condor
>>  VARS COPY executable="/bin/cp" VARS COPY
>> inputfile="/home/jerome/file.src" VARS COPY
>> outputfile="/home/jerome/file.copie"
>> 
>> Then here is the job.cp.condor file: Universe = vanilla executable
>> = $(executable) transfer_executable = False should_transfer_files =
>> NO Notification = Error arguments = $(inputfile) $(outputfile) 
>> output = job.cp.out error = job.cp.err log = job.cp.log queue
> 
> This all looks good.
> 
> 
>> And my Java code:
>> 
>> Schedd schedd = new Schedd (new URL("http://localhost:8181";)); 
>> Transaction xact = schedd.createTransaction(); xact.begin(30); int
>> cluster = xact.createCluster(); int job = xact.createJob(cluster); 
>> File?? files = { new File("/home/jerome/cpWorkflow.dag"), new
>> File("/home/jerome/job.cp.condor"), new File("/bin/cp"), new
>> File("/home/jerome/file.src")};
>> 
>> xact.submit(cluster, job, "jerome", UniverseType.SCHEDULER, 
>> "/opt/condor-6.8.5/bin/condor_dagman", "-f -l . -Debug 3 " + 
>> "-Lockfile myLockFile -Dag myDag -Rescue myRescuDag -Condorlog
>> myLog", null, null, files);
>> 
>> xact.commit();
> 
> This looks good too.
> 
> 
>> The exemple is so easy it should work, what am I missing ? I know
>> you said to avoid the /path/to ... but not sure what do you mean,
>> is it for exemple better to create juste a file like that: File??
>> files = { new File("/home/jerome/*"), new File("/bin/cp")};
> 
> Ignore that. I thought maybe your job.cp.condor was referencing the 
> input/output with full paths, which wouldn't work since Condor puts 
> everything in a single directory when you transfer it.
> 
> I have a feeling condor_dagman wants to put some special attributes
> in the Job ad, maybe something related to a log file. You should try
> two things: 1) look at the .sub file that condor_submit_dag is
> creating and look for any values you might not expect in a regular
> submit file, i.e. dagman_something; 2) look at the job you
> successfully submitted with condor_submit_dag (use condor_q -long)
> and look for dagman specific attributes. Once you've done one or both
> of those you'll probably find something that you need to add to the
> job ad you are submitting, I believe it's the "extraAttrs" argument
> to xact.submit() (2nd to last arg?).
> 
> Sorry about this, but Condor uses numerous attributes that are not 
> normally exposed to users.
> 
> Best,
> 
> 
> matt
> 
> 
>> thx, Jerome
>> 
>> PS: I tryed the condor_submit_dag, this is working perfectly ...
>> the only difference is the condor_dagman process run stright away
>> after the submission when using the condor_submit command, but when
>> using my Java code, the condor_dagman is in idle, so I have to
>> submit a totaly differnet process (using condor_submit) to make it
>> runing. But when the condor_dagman is runing never the sub job is
>> printed !! (when the COPY job is printed using the
>> condor_submit_dag commande!!)
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> -------- Message d'origine-------- De:
>> condor-users-bounces@xxxxxxxxxxx de la part de Matthew Farrellee 
>> Date: lun. 24/09/2007 19:46 À: Condor-Users Mail List Objet : Re:
>> [Condor-users] RE :  birdbath dag submission
>> 
>> condor_dagman is just a program that reads your DAG and runs the
>> jobs specified in it. It runs them by submitting them to Condor,
>> and it uses condor_submit to do that. That means you need to give
>> condor_dagman access to the submit files so it can hand them off to
>> condor_submit.
>> 
>> You'll want to send execjob2 too, and you should try it all without
>>  using "path/to/" -- put everything into a single directory, Condor
>> likes that. Also, make sure your dag runs if you submit it with 
>> condor_submit_dag...
>> 
>> 
>> matt
>> 
>> Mariette, Jerome wrote:
>>> well my dagfile looks like that:
>>> 
>>> JOB JOB1 /path/to/job.job1.condor JOB JOB2
>>> /path/to/job.job2.condor
>>> 
>>> VARS JOB1 executable="/path/to/exejob1" VARS JOB1
>>> input="path/to/inputjob1" VARS JOB1 output="path/to/outputjob1" 
>>> VARS JOB2 executable="/path/to/exejob2" VARS JOB2
>>> input="path/to/outputjob1"
>>> 
>>> PARENT JOB1 Child JOB2
>>> 
>>> so in order to send files, I added the following lines:
>>> 
>>>> Schedd schedd = new Schedd (new URL("http://localhost:8181";)); 
>>>> Transaction xact = schedd.createTransaction(); xact.begin(30); 
>>>> int cluster = xact.createCluster(); int job =
>>>> xact.createJob(cluster);
>>> File?? files = { new File("/path/to/DAGfile"), new
>>> File("/path/to/job.job1.condor"), new
>>> File("/path/to/job.job2.condor"), new
>>> File("/path/to/inputjob1")};
>>> 
>>> xact.submit(cluster, job, "jerome", UniverseType.SCHEDULER, 
>>> "/opt/condor-6.8.5/bin/condor_dagman", /* Path to the dagman
>>> binarie */ "-f -l . -Debug 3 " + "-Lockfile myLockFile -Dag myDag
>>> -Rescue myRescuDag -Condorlog myLog", null, null, files);
>>> 
>>>> xact.commit();
>>> I still have the same error: failed while reading from pipe ... 
>>> ERROR: failed to initialize condor job log
>>> 
>>> 
>>> Moreover, I was wondering why I do have to send a job and
>>> sometime more than one to make condor begin to process my jobs ? 
>>> thx for your help,
>>> 
>>> Jerome
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -------- Message d'origine-------- De:
>>> condor-users-bounces@xxxxxxxxxxx de la part de Matthew Farrellee 
>>> Date: lun. 24/09/2007 17:02 À: Condor-Users Mail List Objet : Re:
>>> [Condor-users] birdbath dag submission
>>> 
>>> This looks pretty good. Are there any files you might need to
>>> submit along with the dag? You probably need to send along any
>>> condor_submit file that is used for a node in the dag. That way
>>> condor_dagman knows what to submit for each step in the dag.
>>> 
>>> 
>>> matt
>>> 
>>> Mariette, Jerome wrote:
>>>> Hi everbody, I'm pretty new in Condor world and have some
>>>> troubles submitting dag. Here is my probleme. I'm using
>>>> birdbath wraper to do it and I'm submitting the dag file like 
>>>> that:
>>>> 
>>>> Schedd schedd = new Schedd (new URL("http://localhost:8181";)); 
>>>> Transaction xact = schedd.createTransaction(); xact.begin(30); 
>>>> int cluster = xact.createCluster(); int job =
>>>> xact.createJob(cluster); xact.submit(cluster, job, "jerome",
>>>> UniverseType.SCHEDULER, "/opt/condor-6.8.5/bin/condor_dagman",
>>>> /* Path to the dagman binarie */ "-f -l . -Debug 3 " + 
>>>> "-Lockfile myLockFile -Dag myDag -Rescue myRescuDag -Condorlog
>>>> myLog", null, null, null); xact.commit();
>>>> 
>>>> what am I doing wrong ? (the Dag File is ok because tryed by
>>>> command it's working) thx
>>>> 
>>>> Jerome
>>>> 
>>>> 
>>>> ------------------------------------------------------------------------
>>>> 
>>>> 
>>>> _______________________________________________ Condor-users
>>>> mailing list To unsubscribe, send a message to
>>>> condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe 
>>>> You can also unsubscribe by visiting 
>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>> 
>>>> The archives can be found at: 
>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>> _______________________________________________ Condor-users
>>> mailing list To unsubscribe, send a message to
>>> condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You
>>> can also unsubscribe by visiting 
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>> 
>>> The archives can be found at: 
>>> https://lists.cs.wisc.edu/archive/condor-users/
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------
>>> 
>>> 
>>> _______________________________________________ Condor-users
>>> mailing list To unsubscribe, send a message to
>>> condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You
>>> can also unsubscribe by visiting 
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>> 
>>> The archives can be found at: 
>>> https://lists.cs.wisc.edu/archive/condor-users/
>> _______________________________________________ Condor-users
>> mailing list To unsubscribe, send a message to
>> condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You
>> can also unsubscribe by visiting 
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>> 
>> The archives can be found at: 
>> https://lists.cs.wisc.edu/archive/condor-users/
>> 
>> 
>> 
>> ------------------------------------------------------------------------
>> 
>> 
>> _______________________________________________ Condor-users
>> mailing list To unsubscribe, send a message to
>> condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You
>> can also unsubscribe by visiting 
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>> 
>> The archives can be found at: 
>> https://lists.cs.wisc.edu/archive/condor-users/
> _______________________________________________ Condor-users mailing
> list To unsubscribe, send a message to
> condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can
> also unsubscribe by visiting 
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/
> 
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> 
> _______________________________________________ Condor-users mailing
> list To unsubscribe, send a message to
> condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can
> also unsubscribe by visiting 
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/

<<winmail.dat>>