[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Put jobs on hold if output or error files grow large?



This would be a nice feature to have.
Here is how we attack a similar problem, we look at disk utilization (write) and if we notice a spike in the traffic we start to snoop around. We consider a spike as a function of standard deviation from 24 hours.Â

On Wed, Jul 23, 2014 at 4:03 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:

On Jul 23, 2014, at 2:24 PM, Carsten Aulbert <Carsten.Aulbert@xxxxxxxxxx> wrote:

> Hi Brian
>
> On 07/23/2014 09:21 PM, Brian Bockelman wrote:
>> Would MAX_TRANSFER_OUTPUT_MB be what you are looking for?
>>
>> That places the job on hold if the final output files (all of them in aggregate) are above a certain size.
>>
>> (See http://research.cs.wisc.edu/htcondor/manual/v8.1/3_3Configuration.html)
>
> I briefly looked at that, but I'm not sure if HTCondor counts this is
> one uses a shared file system or these files are placed on the execute
> nodes local storage. Or will these counted as well?

This only counts the files done through HTCondor file transfer. So, files on a shared file system likely won't be counted.

Brian
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
--- Get your facts first, then you can distort them as you please.--