[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] On-the-fly DAGs?



 

Hi Mark,

 

Thank you very much for your reply!

 

The Michael's suggestion using condor_wait is sufficient for me at least for now. Its strength is that it does not require planning the whole workflow in advance and it is very simple to implement. I have been doing it since yesterday and it worked as expected. This can even mimic running condor_wait on 2 log files at once (if I have running A and B and want to start C if both of A and B are done) by running them sequentially. I have also changed my code to put all logs of a job cluster into the same log file per his suggestion.

 

Ideally, I would want to do something like this:

 

# on_the_fly_dag.dag

SCRIP PRE NextStep CreateNextStepSub.py

JOB            NextStep NextStep.sub

PARENT @1000 CHILD NextStep # @ indicates that it is a currently running cluster number

 

then

 

condor_submit_dag on_the_fly_dag.dag # if on_the_fly_dag.dag was not submitted yet => ID=1001

condor_submit_dag -update 1001 on_the_fly_dag.dag

# the command checks that the new dag does not contradict the dag job 1001 originally got and updates it

# in the simplest and easiest case, it would suffice to allow only addition of new vertices to the graph

 

This way I can keep adding to the DAG on the go, have some computing done before the whole workflow is finished, and once all the steps are coded, the on_the_fly_dag.dag can be easily converted into the final DAG by replacing e.g. @1000 with some job names.

 

Thank you,

Siarhei.

 

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Mark Coatsworth
Sent: Wednesday, May 09, 2018 5:29 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] On-the-fly DAGs?

 

Hi Siarhei,

There are several different ways to do what you're asking for.

 

If Michael's suggestion using condor_wait does what you need, that's great! I think you would need to run this manually though so it's a bit prone to error.

 

Another option would be to use POST scripts. If you put your original job into a single-node DAG, you could write a POST script which checks a certain condition. If the condition passes, your script would write out a new DAG file and then run condor_submit_dag on it. If the condition fails, your script exits and the on-the-fly DAG is done. 

 

A third option (depending on your needs) would be to use a SUBDAG EXTERNAL object.  When you define this, you have to provide a .dag file for it to run, although that file doesn't need to exist up until the moment DAGMan reaches that node. So your earlier jobs, PRE scripts and POST scripts can look at their output and write to the .dag file. There are more details in the manual: 

http://research.cs.wisc.edu/htcondor/manual/current/2_10DAGMan_Applications.html#SECTION0031091200000000000000

 

Mark

 

 

On Wed, May 9, 2018 at 10:36 AM, Michael Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx> wrote:

My upcoming HTCondor Week presentation goes over a few useful tricks with the newer submit description features which reduce the need for script-generated submit descriptions. Keep an eye out for it in the proceedings, or if you're attending the conference I'll see you there! You might also find the HTCondor Python bindings to be useful for defining and submitting jobs.

Just to be sure I'm clear, we're not talking about the Error or Output parameters, but the Log parameter in the submit description - generally you only ever want one log per cluster, since it's logging the management of the entire cluster or group of clusters from a single submission. It doesn't contain much information that's particularly useful in the context of a single job within a 1000-job cluster.

As for using a cluster number instead of a log file, you could do a condor_wait wrapper like so:

#!/bin/bash
condor_wait  $(condor_q $1 -af UserLog | head -1)

You'd give this script the job ID as the argument, and it would wait until all the jobs in the specified cluster are done, assuming the cluster defines a UserLog.

        -Michael Pelletier.

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Vaurynovich, Siarhei
Sent: Wednesday, May 9, 2018 11:17 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>

Subject: [External] Re: [HTCondor-users] On-the-fly DAGs?


Thank you for your reply, Michael!

That sounds like what I want. I would just prefer to not give a log file as input but instead a cluster number only, and let Condor figure out which log file to watch.

Currently, my submit files are generated programmatically and each job in a cluster gets its own log file. It seems I need to reconsider it.

Thank you,
Siarhei.


-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Michael Pelletier
Sent: Wednesday, May 09, 2018 10:20 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] On-the-fly DAGs?

Sounds like a good use for condor_wait.

When you give condor_wait a job's log file (log = htcondor-$(Cluster).log) it watches the file and will only exit when all the jobs in that log have completed.

So what you'll want to do is write a little script which runs a condor_wait on the pending job cluster and then submits your next job after condor_wait exits.

You could submit it as a "local" universe job so that the condor_wait that's sitting around doing nothing wouldn't be using a CPU slot.

        -Michael Pelletier.

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Vaurynovich, Siarhei
Sent: Tuesday, May 8, 2018 9:35 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [External] [HTCondor-users] On-the-fly DAGs?


Hello,

Could you please let me know if it is possible to create on-the-fly DAGs in HTCondor?

Here is an example: I work on some code and when it is ready I submit a number of jobs to job cluster 1000. After that I work on the next processing step and finish the needed code before the jobs in cluster 1000 are completed. I want to be able to say: start this next set of jobs when and if all the jobs in cluster 1000 are completed successfully, i.e. I want to create an "on-the-fly" DAG. The goal is to have some computing to be done on some steps of the workflow even before the whole workflow code is ready and keep adding to the workflow on the fly.

Thank you,
Siarhei.

............................................................................



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
............................................................................

Trading instructions sent electronically to Bernstein shall not be deemed accepted until a representative of Bernstein acknowledges receipt electronically or by telephone.  Comments in this e-mail transmission and any attachments are part of a larger body of investment analysis. For our research reports, which contain information that may be used to support investment decisions, and disclosures see our website at www.bernsteinresearch.com.

For further important information about AllianceBernstein please click here http://www.abglobal.com/disclaimer/email/disclaimer.html


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



 

--

Mark Coatsworth

Systems Programmer

Center for High Throughput Computing

Department of Computer Sciences

University of Wisconsin-Madison

+1 608 206 4703