[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_dagman -Condorlog in 7.2.0



Hello Kent,

 CDF uses the condor submit command to start a dag with this ClassAd
(condor_submit  job.dagman.ClassAd  )  run from a python script

condor_submit = popen2.Popen4("%s %s" % (submit_fullpath,self.dagman_fname),1)

In Condor v7.0.5 (and below) CDF job.dagman.ClassAd file contained this line:

arguments = -f -l . -Debug 2 -Lockfile job.dagman.lock -Dag job.dag -Rescue job.dag.rescue -Condorlog job.log -maxidle 25 -MaxPost 1

In Condor v7.1.4  the CDF job.dagman.ClassAd file contains this line:

arguments = -f -l . -Debug 2 -Lockfile job.dagman.lock -allowversionmismatch -Dag job.dag -maxidle 25 -MaxPost 1


The bottom of the e-mail contains more detail.

I hope that this explains how the CDF Condor middleware works a litte bit better.

Regards,

Doug Benjamin



Here is the contents of a "typical" dagman ClassAd. (tested on Condor 7.1.4)

executable      = stage/condor_dagman
log             = /export/CafCondor/cafIn/dagman.log
getenv          = True
remove_kill_sig = SIGUSR1
x509userproxy   = /export/CafCondor/tickets/x509cc_benjamin
arguments = -f -l . -Debug 2 -Lockfile job.dagman.lock -allowversionmismatch -Dag job.dag -maxidle 25 -MaxPost 1
+CAFScheddName = "schedd_1"
universe        = scheduler
output          = job.dagman.lib.out
error           = job.dagman.lib.out
environment = _CONDOR_DAGMAN_LOG=job.dagman.dagman.out;_CONDOR_MAX_DAGMAN_LOG=0;_CONDOR_DAGMAN_SUBMIT_DELAY=10;_CONDOR_SCHEDD_NAME=schedd_1;
copy_to_spool   = false
Notification    = Never
+Owner          = undefined
queue

Here are the first few lines of one type of dag file (job.dag) that we use:

Job Section_1 section_1.ClassAd
Script POST Section_1 return_OK.sh
Job Section_2 section_2.ClassAd
Script POST Section_2 return_OK.sh

Here is the contents of section_1.ClassAd -

Priority =   0
Executable              = stage/CafExe
Arguments = -s 1 -job_start_section 1 -job_end_section 500 -user benjamin -cdfsoft NONE -krb5cc krb5cc_benjamin -infile __job_in__.tgz -inurl http://fcdftest016.fnal.gov:8000/condorcafstage -insha1 d13dcf7563f8ef358b424faea4f5dc5529b50dc6 -outfile rcp be njamin@xxxxxxxxxxxxxxxx:/data/dukpc30/a/benjamin/cafout/caftest/cdfnamcafgridtest/cdfnamcafgridtest_test_job_1.tgz -outfile2 fcp cdfda ta@xxxxxxxxxxxxxxxxxxxx:/export/data3/cdfdata/icaf/icaf_out_benjamin_1.tgz -iomonfile .logIOfile.log -icaf fcdfdata113.fnal.gov /expo rt/data1/benjamin/scratch -mintime 60 -maxtime 21600 -submit_time 1227690006 -log CafExe_1.log -jobout job_1.out -joberr job_1.err -io map iomap.txt -sam_cpp_api -callback 131.225.211.227 8125 -cafname CDFNAMCAFGRIDTEST -- ./test-job.sh 1 benjamin@xxxxxxxxxxxxxxxx:
/data/dukpc30/a/benjamin/cafout/caftest/cdfnamcafgridtest/
Universe                = vanilla
requirements = (Memory>=200) && (Disk>=4000000) && (OpSys == "LINUX") && ((Arch == "INTEL")||(Arch == "x86")||(Arch == "x86_64")||(Arc
h == "X86")||(Arch == "X86_64"))
Environment             = LD_LIBRARY_PATH=/lib:/user/lib
x509userproxy   = /export/CafCondor/tickets/x509cc_benjamin
Environment             = LD_LIBRARY_PATH=/lib:/usr/lib
Notification            = Never
+Owner                  = undefined
copy_to_spool           = false
job_lease_duration      = 10000
should_transfer_files   = YES
when_to_transfer_output = ON_EXIT
transfer_Input_files = /export/CafCondor/tickets/krb5cc_benjamin,stage/iomap.txt,job.ClusterId
encrypt_input_files      = /export/CafCondor/tickets/krb5cc_benjamin
on_exit_remove         = true
Log                     = job.log
Output                  = section_1.out
Error                   = section_1.err
Periodic_Remove = (((JobStatus == 2) && ((CurrentTime - JobCurrentStartDate) > 24000)) =?= True)
+CAFPool                = "CDFNAMCAFGRIDTEST"
+CAFGroup =               "short"
+CAFAcctGroup =           "common"
+CAFSection             = 1
+CAFDH                  = "none"
+CAFAllowPreempt        = TRUE
+CAFMaxTime =      22800
Queue

*****************************



R. Kent Wenger wrote:
On Fri, 26 Dec 2008, Doug Benjamin wrote:

  Why that switch has gone away.  Since CDF uses the dagman file named
job.dag, a log file job.log is
automatically created.  I discovered this when testing 7.1.4 several
weeks ago.

If your DAG file is job.dag, you should get a job.dag.dagman.out -- unless you are modifying the submit file that condor_submit_dag generates (or
creating your own submit file).  Job.dag.dagman.out is not related to
the -Condorlog flag at all. It's specified by the "log =" attribute in the submit file.

  Marian Zvada and I reported this in a CDF development meeting a while
back, there are other switches
that will also have to change. (The CDF middleware (the submitter
module) on fcdftest016 has the proper configuration).

Hmm, are you guys creating the submit files for DAGMan directly, rather than by running condor_submit_dag? If you're running condor_submit_dag,
the changes should be pretty much invisible, because condor_submit_dag
will generate the right submit file.

Kent Wenger
Condor Team
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/