[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] file transfer GoAhead delays



Hi David,

This waiting for a "Go Ahead" is to help mitigate issues of overloading disk I/O on the Access Point (host machine with Schedd) caused by too many concurrent jobs doing file transfer. The Schedd has a built-in transfer queue that all file transfers need to get the go ahead from in order to begin transferring files between the shadow and starter. When file transfer does start the shadow will contact the transfer queue to get permission before sending or receiving any files. Unfortunately, at the moment there is no distinction between transfers that do and don't require a transfer queue go ahead. So, jobs with only URL based transfers that have nothing to do with the AP still grab wait for a go ahead before starting.

-Cole Bollig

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of David Schultz <david.schultz@xxxxxxxxxxxxxxxx>
Sent: Monday, October 2, 2023 2:14 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] file transfer GoAhead delays
 
Hi,

I was taking a look at EP starter logs to debug something else, and saw a curious pattern.  For nearly every file transfer there is a delay of several seconds (5-15s is typical) waiting for a "GoAhead" from the AP to initiate the transfer, whether that's a download or upload, and whether the transfer involves the AP or only external servers.

Are there any causes of this, or knobs I should tweak?  Our AP is fairly busy, so I could see that being part of the issue.

I'm somewhat motivated to fix this, as 10 seconds idling multiplied by 500k jobs daily is several wasted machines worth of compute.

Best,
David