[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Marking child as DONE



Hi Nick,

Rather than setting 'executable = /bin/true' you could add to
the submit file 'hold = True'. The child jobs will then be submitted
and held and will not run unless you explicitly call
condor_release on them.

In a similar way you could set 'noop_job = True' for the child
jobs and the jobs will simply be marked as completed with a
return value of 0.

Scott

> Dear all,
> 
> After a DAG has run partway through, I've decided that the bottom-most  
> post-processing job (several thousand of them) should/can not be run.   
> When my rescue DAG comes, as it inevitably does, I would like not to  
> execute these.  So far, no problem; a one-line bash/sed invocation  
> takes care of that:
> 
> cat $f | sed 's/.*mysubfile.*/& DONE/' > ${f}.sires_done;
> 
> The problem is that not all of the parents have completed  
> successfully.  I'd like to resubmit the parents, but not these  
> children.  When I naively mark them as DONE, as above, I get the  
> following error while dagman parses the DAG.
> 
> 3/13 20:25:13 ERROR: AddParent( ea0bca7d3503cccca43dff66a99c1516 )  
> failed for no
> de a5bf08f49f3323fdd5f838f6d89918f7: STATUS_DONE      child may not be  
> given a n
> ew STATUS_READY     parent
> 
> Removing the JOB lines produces an error that the parent-child  
> relationships refer to a non-existent job.  (I don't have the exact  
> message handy.)
> 
> I see a few solutions, none of which I like:
> * resubmit without modification and let the children fail (wastes  
> resources)
> * change the submit files to point to /bin/true and run in the local  
> universe (a lot of scheduling overhead, I'd think, but maybe this is  
> negligible)
> * identify all nodes of a class and remove all references to each of  
> them (more code than I want to write at the moment)
> 
> Can I get some gut reactions to these options or perhaps new, cleverer  
> options?
> 
> Thanks,
> Nick
> 
> ===================================
> Nickolas Fotopoulos
> nvf@xxxxxxxxxxxxxxxxxxxx
> 
> Office: (414) 229-6438
> Fax: (414) 229-5589
> University of Wisconsin - Milwaukee
> Physics Bldg, Rm 471
> ===================================
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/