[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] hanging file transfers



> On Aug 21, 2019, at 3:20 PM, Dimitri Maziuk via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
> 
> On 8/21/19 2:33 PM, John M Knoeller wrote:
>>> The manual has JobStatus 6 for "transferring output" but none transferring input, is that correct?
>> 
>> yes.  but there is a Boolean attribute that is TRUE while a job is transferring input, and a second one
>> that indicates if it is waiting in the queue or actually transferring, so the expression
>> 
>> TransferringInput && ! TransferQueued
>> 
>> Will evaluate to true if the job is actually transferring input data.
> 
> OK but with job status I can do
> 
> (CurrentTime - EnteredCurrentStatus) > $TIME_CAP
> 
> Just knowing it's transferring input is not enough to decide if the
> transfer is hanging or not. I suppose I could do
> 
> (TransferringInput && ! TransferQueued) && ((RemoteWallClockTime -
> CumulativeSuspensionTime) > $TIME_CAP)
> 
> I don't like it because it does not explicitly track transfer time, it
> *assumes* that my total runtime spent waiting for transfer, but I
> suppose it's not an unreasonable assumption when we're transferring
> input files.


Barring unusual high-load situations, the only time-consuming thing that happens between the JobStatus changing to 2 (RUNNING) and the job actually starting is input file transfer (including time when the transfer is queued). So your expression referencing EnteredCurrentStatus is still reasonable.

Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project