[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] DAGman



Hi,

I was trying to get Dagman to work for the first time, but I seem to be doing something wrong. My A.sub file is:

EXECUTABLE = /bin/env
OUTPUT = bla.out
ERROR = bla.err.$(CLUSTER)
LOG = bla.log
Universe = Vanilla
Queue 1

My test.dag is:

JOB A A.sub
SCRIPT PRE A test.sh
SCRIPT POST A test.sh
 
Finally, test.sh is simply:

#!/bin/bash
/bin/date

and works as expected when run from the command line:

[bgoncalves@dracula dag]$ ./test.sh
Wed Aug  3 17:25:54 EDT 2005
[bgoncalves@dracula dag]$ 

When I submit it I get:

[bgoncalves@dracula dag]$ condor_submit_dag test.dag

-----------------------------------------------------------------------
File for submitting this DAG to Condor           : test.dag.condor.sub
Log of DAGMan debugging messages                 : test.dag.dagman.out
Log of Condor library debug messages             : test.dag.lib.out
Log of the life of condor_dagman itself          : test.dag.dagman.log

Condor Log file for all Condor jobs of this DAG: test.dag.dummy_log
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 28.
-----------------------------------------------------------------------
[bgoncalves@dracula dag]$


And the test.dat.dagman.out looks something like this:

8/3 17:24:15 ******************************************************
8/3 17:24:15 ** condor_scheduniv_exec.28.0 (CONDOR_DAGMAN) STARTING UP
8/3 17:24:15 ** /home/condor/hosts/nfsserver_eth1/spool/cluster28.ickpt.subproc0
8/3 17:24:15 ** $CondorVersion: 6.6.9 Mar 10 2005 $
8/3 17:24:15 ** $CondorPlatform: I386-LINUX_RH9 $
8/3 17:24:15 ** PID = 19223
8/3 17:24:15 ******************************************************
8/3 17:24:15 Using config file: /home/condor/condor_config
8/3 17:24:15 Using local config files: /home/condor/hosts/dracula/condor_config.local
8/3 17:24:15 DaemonCore: Command Socket at <192.168.146.254:53617>
8/3 17:24:15 argv[0] == "condor_scheduniv_exec.28.0"
8/3 17:24:15 argv[1] == "-Debug"
8/3 17:24:15 argv[2] == "3"
8/3 17:24:15 argv[3] == "-Lockfile"
8/3 17:24:15 argv[4] == "test.dag.lock"
8/3 17:24:15 argv[5] == "-Dag"
8/3 17:24:15 argv[6] == "test.dag"
8/3 17:24:15 argv[7] == "-Rescue"
8/3 17:24:15 argv[8] == "test.dag.rescue"
8/3 17:24:15 argv[9] == "-Condorlog"
8/3 17:24:15 argv[10] == "test.dag.dummy_log"
8/3 17:24:15 DAG Lockfile will be written to test.dag.lock
8/3 17:24:15 DAG Input file is test.dag
8/3 17:24:15 Rescue DAG will be written to test.dag.rescue
8/3 17:24:15 All DAG node user log files:
8/3 17:24:15   /home/bgoncalves/progs/dag/bla.log
8/3 17:24:15 Parsing test.dag ...
8/3 17:24:15 jobName: A
8/3 17:24:15 jobName: A
8/3 17:24:15 Dag contains 1 total jobs
8/3 17:24:15 Deleting any older versions of log files...
8/3 17:24:15 ReadMultipleUserLogs: deleting older version of /home/bgoncalves/progs/dag/bla.log
8/3 17:24:15 Bootstrapping...
8/3 17:24:15 Number of pre-completed jobs: 0
8/3 17:24:15 Running PRE script of Job A...
8/3 17:24:15 Registering condor_event_timer...
8/3 17:24:15 PRE Script of Job A failed with status 2
8/3 17:24:16 ERROR: failed to initialize condor job log -- ignore unless error repeats
8/3 17:24:16 Of 1 nodes total:
8/3 17:24:16  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
8/3 17:24:16   ===     ===      ===     ===     ===        ===      ===
8/3 17:24:16     0       0        0       0       0          0        1
8/3 17:24:16 ERROR: the following job(s) failed:
8/3 17:24:16 ---------------------- Job ----------------------
8/3 17:24:16       Node Name: A
8/3 17:24:16          NodeID: 0
8/3 17:24:16     Node Status: STATUS_ERROR
8/3 17:24:16           Error: PRE Script failed with status 2
8/3 17:24:16 Job Submit File: A.sub
8/3 17:24:16      PRE Script: test.sh
8/3 17:24:16     POST Script: test.sh
8/3 17:24:16   Condor Job ID: [not yet submitted]
8/3 17:24:16       Q_PARENTS: <END>
8/3 17:24:16       Q_WAITING: <END>
8/3 17:24:16      Q_CHILDREN: <END>
8/3 17:24:16 ---------------------------------------    <END>
8/3 17:24:16 Aborting DAG...
8/3 17:24:16 Writing Rescue DAG to test.dag.rescue...
8/3 17:24:16 **** condor_scheduniv_exec.28.0 (condor_DAGMAN) EXITING WITH STATUS 1

What am I doing wrong? :(
Thanks!

Bruno
--
*******************************************
Bruno Miguel Tavares Goncalves, MS
PhD Student
Emory University
Department of Physics
Office No. N117-C
400 Dowman Drive
Atlanta, Georgia 30322
Homepage: www.bgoncalves.com
Email: bgoncalves@xxxxxxxxx
Phone: (404) 712-2441
Fax:   (404) 727-0873
*******************************************