[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] POST on each Proc



I'm in a situation where I need a DAG to run a POST script (or
something equivalent) after each procid finishes while queue is
greater than 1 in the submit file, to determine if the remainder of
the jobs within that cluster, or other jobs performing a similar
action within another DAG should be aborted. Obviously the DAG only
runs POST after an entire job Cluster completes, but I'm curious if
there is another way to have something run after each ProcId finishes
so we can kill the remainder of the jobs if we get a result elsewhere.

Our Jobs generally contain about 50,000 15 hour jobs that can/should
be exited if one of the Processes (ProcId) finishes with a positive
result. Given we only have 10,000 cores to work against, we could have
a positive result minutes into processing, but have to wait the days
necessary for all Jobs to complete in order to know this in POST.

I've researched ABORT-DAG-ON, but we have little control over the
application we run, so I'd need to write a wrapper that interprets the
results and exits appropriately to stop the jobs within the cluster
and then handle the removal of like jobs in FINAL after the abort. I'm
just curious if there is a way to maintain use of the native binary
we're using without a wrapper, without also having to define each
ProcId manually in the DAG.

Thanks,
ChrisP