[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] DAGMan ? RE : birdbath dag submission



Someone with more DAGMan experience should take a look at this. DAGMan
makes some assumptions about its operating environment that I'm no
longer familiar with.

You can pass environment variables to DAGMan by using the Env (Environment? use condor_q -long to find the right now) attribute on your Job.


matt

Mariette, Jerome wrote:
I guess I got the problem, when I deleted the
/etc/condor/condor_config, the file log file writes: Job executing on
host: <127.0.0.1:8181>

what sounds much better to me ... but the job cannot be process still
because it cannot find the condor_config file! The CONDOR_CONFIG
environment variable is well set on the machine so I don't understant
why when I'm using my Java code it can't find it !!

there is a way to say to the dagman to use the environmental
variables ? or just to set one ?

let me know, Jerome










-----Original Message----- From: condor-users-bounces@xxxxxxxxxxx on
behalf of Mariette, Jerome Sent: Tue 9/25/2007 2:24 PM To:
Condor-Users Mail List; Condor-Users Mail List Subject: Re:
[Condor-users] RE :  RE :  birdbath dag submission


Also: find an other big difference in the dagman.out files: 9/25
14:17:05 Using config source: /etc/condor/condor_config # 9/25
11:16:24 Using config source: /opt/condor-6.8.5/etc/condor_config

how can I specify the config file I want to use ?














-----Original Message----- From: condor-users-bounces@xxxxxxxxxxx on
behalf of Mariette, Jerome Sent: Tue 9/25/2007 10:09 AM To:
Condor-Users Mail List Subject: RE: [Condor-users] RE :  RE :
birdbath dag submission


Allright,


Here is the dagman.log file written after a DAG submission from my
Java code: ----------------------------------------------------------------------------------------------
 001 (439.000.000) 09/25 09:27:34 Job executing on host:
<127.0.0.1:50817> ... 006 (439.000.000) 09/25 09:27:42 Image size of
job updated: 7272 ... 005 (439.000.000) 09/25 09:28:18 Job
terminated. (1) Normal termination (return value 1) Usr 0 00:00:00,
Sys 0 00:00:00  -  Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00  -
Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
 Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage 70265  -  Run
Bytes Sent By Job 4207584  -  Run Bytes Received By Job 70265  -
Total Bytes Sent By Job 4207584 - Total Bytes Received By Job ... ----------------------------------------------------------------------------------------------


Here is the dagman.log file written after a DAG submission from the
command condor_submit_dag: ----------------------------------------------------------------------------------------------
 000 (441.000.000) 09/25 09:30:40 Job submitted from host:
<127.0.0.1:8181> ... 001 (441.000.000) 09/25 09:30:40 Job executing
on host: <127.0.0.1:8181> ... 005 (441.000.000) 09/25 09:31:26 Job
terminated. (1) Normal termination (return value 0) Usr 0 00:00:00,
Sys 0 00:00:00  -  Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00  -
Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
 Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage 0  -  Run Bytes
Sent By Job 0  -  Run Bytes Received By Job 0  -  Total Bytes Sent By
Job 0 - Total Bytes Received By Job ... ----------------------------------------------------------------------------------------------


Sounds like it's not executting on the 8181 port !!! but I don't
understant as I access to the Schedd trought : http://localhost:8181 Does it make sens ??






















-----Original Message----- From: condor-users-bounces@xxxxxxxxxxx on
behalf of Matthew Farrellee Sent: Tue 9/25/2007 6:15 AM To:
Condor-Users Mail List Subject: Re: [Condor-users] RE :  RE :
birdbath dag submission

(inline)

Mariette, Jerome wrote:
So I added all the files involved even the 2 executable and I still
get the same error. To make it simple I just write what I'm doing
with a cp of a file ... what is the only job in my DAGfile. I still
have the same error.

Here is the DAG: cpWorkflow.dag JOB COPY /home/jerome/job.cp.condor
 VARS COPY executable="/bin/cp" VARS COPY
inputfile="/home/jerome/file.src" VARS COPY
outputfile="/home/jerome/file.copie"

Then here is the job.cp.condor file: Universe = vanilla executable
= $(executable) transfer_executable = False should_transfer_files =
NO Notification = Error arguments = $(inputfile) $(outputfile) output = job.cp.out error = job.cp.err log = job.cp.log queue

This all looks good.


And my Java code:

Schedd schedd = new Schedd (new URL("http://localhost:8181";)); Transaction xact = schedd.createTransaction(); xact.begin(30); int cluster = xact.createCluster(); int job = xact.createJob(cluster); File?? files = { new File("/home/jerome/cpWorkflow.dag"), new
File("/home/jerome/job.cp.condor"), new File("/bin/cp"), new
File("/home/jerome/file.src")};

xact.submit(cluster, job, "jerome", UniverseType.SCHEDULER, "/opt/condor-6.8.5/bin/condor_dagman", "-f -l . -Debug 3 " + "-Lockfile myLockFile -Dag myDag -Rescue myRescuDag -Condorlog
myLog", null, null, files);

xact.commit();

This looks good too.


The exemple is so easy it should work, what am I missing ? I know
you said to avoid the /path/to ... but not sure what do you mean,
is it for exemple better to create juste a file like that: File??
files = { new File("/home/jerome/*"), new File("/bin/cp")};

Ignore that. I thought maybe your job.cp.condor was referencing the input/output with full paths, which wouldn't work since Condor puts everything in a single directory when you transfer it.

I have a feeling condor_dagman wants to put some special attributes
in the Job ad, maybe something related to a log file. You should try
two things: 1) look at the .sub file that condor_submit_dag is
creating and look for any values you might not expect in a regular
submit file, i.e. dagman_something; 2) look at the job you
successfully submitted with condor_submit_dag (use condor_q -long)
and look for dagman specific attributes. Once you've done one or both
of those you'll probably find something that you need to add to the
job ad you are submitting, I believe it's the "extraAttrs" argument
to xact.submit() (2nd to last arg?).

Sorry about this, but Condor uses numerous attributes that are not normally exposed to users.

Best,


matt


thx, Jerome

PS: I tryed the condor_submit_dag, this is working perfectly ...
the only difference is the condor_dagman process run stright away
after the submission when using the condor_submit command, but when
using my Java code, the condor_dagman is in idle, so I have to
submit a totaly differnet process (using condor_submit) to make it
runing. But when the condor_dagman is runing never the sub job is
printed !! (when the COPY job is printed using the
condor_submit_dag commande!!)







-------- Message d'origine-------- De:
condor-users-bounces@xxxxxxxxxxx de la part de Matthew Farrellee Date: lun. 24/09/2007 19:46 À: Condor-Users Mail List Objet : Re:
[Condor-users] RE :  birdbath dag submission

condor_dagman is just a program that reads your DAG and runs the
jobs specified in it. It runs them by submitting them to Condor,
and it uses condor_submit to do that. That means you need to give
condor_dagman access to the submit files so it can hand them off to
condor_submit.

You'll want to send execjob2 too, and you should try it all without
 using "path/to/" -- put everything into a single directory, Condor
likes that. Also, make sure your dag runs if you submit it with condor_submit_dag...


matt

Mariette, Jerome wrote:
well my dagfile looks like that:

JOB JOB1 /path/to/job.job1.condor JOB JOB2
/path/to/job.job2.condor

VARS JOB1 executable="/path/to/exejob1" VARS JOB1
input="path/to/inputjob1" VARS JOB1 output="path/to/outputjob1" VARS JOB2 executable="/path/to/exejob2" VARS JOB2
input="path/to/outputjob1"

PARENT JOB1 Child JOB2

so in order to send files, I added the following lines:

Schedd schedd = new Schedd (new URL("http://localhost:8181";)); Transaction xact = schedd.createTransaction(); xact.begin(30); int cluster = xact.createCluster(); int job =
xact.createJob(cluster);
File?? files = { new File("/path/to/DAGfile"), new
File("/path/to/job.job1.condor"), new
File("/path/to/job.job2.condor"), new
File("/path/to/inputjob1")};

xact.submit(cluster, job, "jerome", UniverseType.SCHEDULER, "/opt/condor-6.8.5/bin/condor_dagman", /* Path to the dagman
binarie */ "-f -l . -Debug 3 " + "-Lockfile myLockFile -Dag myDag
-Rescue myRescuDag -Condorlog myLog", null, null, files);

xact.commit();
I still have the same error: failed while reading from pipe ... ERROR: failed to initialize condor job log


Moreover, I was wondering why I do have to send a job and
sometime more than one to make condor begin to process my jobs ? thx for your help,

Jerome








-------- Message d'origine-------- De:
condor-users-bounces@xxxxxxxxxxx de la part de Matthew Farrellee Date: lun. 24/09/2007 17:02 À: Condor-Users Mail List Objet : Re:
[Condor-users] birdbath dag submission

This looks pretty good. Are there any files you might need to
submit along with the dag? You probably need to send along any
condor_submit file that is used for a node in the dag. That way
condor_dagman knows what to submit for each step in the dag.


matt

Mariette, Jerome wrote:
Hi everbody, I'm pretty new in Condor world and have some
troubles submitting dag. Here is my probleme. I'm using
birdbath wraper to do it and I'm submitting the dag file like that:

Schedd schedd = new Schedd (new URL("http://localhost:8181";)); Transaction xact = schedd.createTransaction(); xact.begin(30); int cluster = xact.createCluster(); int job =
xact.createJob(cluster); xact.submit(cluster, job, "jerome",
UniverseType.SCHEDULER, "/opt/condor-6.8.5/bin/condor_dagman",
/* Path to the dagman binarie */ "-f -l . -Debug 3 " + "-Lockfile myLockFile -Dag myDag -Rescue myRescuDag -Condorlog
myLog", null, null, null); xact.commit();

what am I doing wrong ? (the Dag File is ok because tryed by
command it's working) thx

Jerome


------------------------------------------------------------------------


_______________________________________________ Condor-users
mailing list To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________ Condor-users
mailing list To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You
can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/



------------------------------------------------------------------------


_______________________________________________ Condor-users
mailing list To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You
can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________ Condor-users
mailing list To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You
can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/



------------------------------------------------------------------------


_______________________________________________ Condor-users
mailing list To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You
can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________ Condor-users mailing
list To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can
also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/





------------------------------------------------------------------------


_______________________________________________ Condor-users mailing
list To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can
also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/