[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Add jobs to a dag from a running job



Hi All.
I wonder what will be the best solution.

Just an example:
While running a deep learning job  with 60 epoch's I wish to run evaluation every 5 epoch's.
The evaluation is async and can run in parallel with the train job. 

One solution is creating a dag the training job will exit every 5 epoch's run evaluation job and next job will continue with the next epoch's.

Another way might be using a dag with and service node the job will use condor_chrip to update the progress and the script (service node) will send evaluation job according the job progress.


Maybe there is better way?

Thanks
David