[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] job's ouput analyzing
- Date: Mon, 06 Feb 2012 22:34:30 -0500
- From: Matthew Farrellee <matt@xxxxxxxxxx>
- Subject: Re: [Condor-users] job's ouput analyzing
On 01/25/2012 04:26 AM, Ivanova Marina wrote:
On 01/12/2012 08:00 AM, Ivanova Marina wrote:
I have a job, that should be started with different arguments on all the nodes of my condor pool.This job can have different output which is saved in file. What I need is to analyze which ouput data has been produced and if it satisfyes some demands, the notification should be sent to the administrator of the pool, else nothing happens and the pool continues its work.
Are there any standard ways to manage with my problem?
Sounds like you want to use DAGMan. But if you're looking to get some
sort of health information about nodes, then you should look at Daemon
Thanks to your advice I used DAGMan.One and the same job but with diffrent arguments was defined to start on each node of the DAG. In the POST SCRIPT of each job was the condition that if turned to true called "condor_rm -all". The problem is that after this command jobs, that were running in that moment, are turned to the X state, and the machines, executing stay Claimed and Idle.So they can't be used to run other jobs in the qeue.
Are there any other ways to stop DAG nodes working
If you want a dag to remove itself, you should remove the dagman job
instead of -all. dagman will cleanup after itself.