Hi folks,
I have been exploring grid application description languages from various
sources and I found DAGMan to be the most promissing. However, I have a
few use cases for which I would appreciate some assistance in expressing
with DAGMan and ClassAd:
1. In a DAG input file, it seems that the name of the submit description
filenames given to jobs constitute a unique name when expressing
dependencies. That was a mouthfull, so here's an example:
# Filename: B.dag
JOB A A.condor DONE
JOB B B.condor
PARENT A CHILD B
So, my understanding is that job B will only run once all jobs
described by A.condor are completed. For example, lets say the
following submit files were enqueued:
1. A.condor
2. B.dag
3. A.condor
Then, would B.dag only run once #1 is completed or once all
submits matching A.condor are completed or is there something
I don't understand?
2. Is there a way to express either a submit description file or a DAG
input file so that an executable is run on each node in a cluster
only once? If not, must I enqueue a submit description file for
each node with something like:
requirements = other.hostname == 'foo'
And so forth for each host. (Note that "hostname" probably isn't
part of ClassAd, but I mean anything that uniquely identifies each
node in a cluster)
3. Would it be possible to remove a resource provider (a machine) from
a cluster but only once the current jobs have completed as well as
all the other dependent jobs as defined by the pending DAG input
files? For example, here's an example:
# Filename: A.dag
JOB A A.condor
JOB B B.condor
PARENT A CHILD B
So, if a node is in the middle of running job A, I would like to be
notified somehow when job B has completed. However, I don't necessarily
want to hard code that I'm waiting for job B to complete, I would rather
express abstractly: tell me when the current jobs and dependents have
completed.
Thanks,
Marc
|