[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor file-transfer vs networked storage



Hi Matt,

I guess I wouldn't emphasize performance and throughput but rather reliability and predictability.

That is, when you use HTCondor file transfer's mechanism (even if it's just a plugin that stages things from the shared filesystem),

1) You know that condor won't start the job until the files are to a local disk, decreasing the likelihood a transient filesystem issue will fail out the job when it's 99% complete.
2) When a job fails, know that condor can tell you if the data staging was the underlying problem and provide a policy that's executed in such a situation.

Some users may value (1) much more highly than performance.

The reverse is also true -- some users might need absolute performance and run so few jobs that reliability is not relevant.  It's all about tradeoffs and value systems in the end...

Brian

> On Aug 22, 2022, at 3:35 PM, Matthew T West via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
> 
> Good evening Nick,
> 
> I am readily aware of the value of HTCondor's file-transfer mechanism and associated sandboxing. But that wasn't my issue.
> 
> My question was:
> 
> When working on a single homogeneous compute cluster, are there any advantages to using HTCondor's file-transfer rather than working off shared network storage?
> 
> So not a grid or distributed campus pool or pulling from remote storage, but a single homogeneous compute cluster in one location that includes a networked file-system. I do apologize if everything after the question in my original email confused matters. Here might be a better way to put it:
> 
> Under what conditions does a shared file-server's degrade such that it would be better to work from local scratch, performance and throughput wise?
> 
> 
> Regards,
> Matt
> 
> On 22/08/2022 20:37, Nick LeRoy wrote:
>> CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.
>> 
>> 
>> On Sat, Aug 20, 2022 at 8:44 AM Matthew T West via HTCondor-users
>> <htcondor-users@xxxxxxxxxxx> wrote:
>>> Hi All,
>>> 
>>> When working on a single homogeneous compute cluster, are there any advantages to using HTCondor's file-transfer rather than working off shared network storage? I guess it would depend on the network and storage speeds.
>>> 
>>> It's just interesting that the "always work in local scratch" mindset I am used to is seen a serious backward step performance wise:
>>> 
>>> Scratch therefore only useful if your network storage or interconnects are slow or saturated ... copying bulk data to local storage / getting all users to copy to local/scratch storage is a quick way to saturate your storage infrastructure.
>>> 
>>> I can find other instances of this HPC conventional wisdom and it intuitively makes sense. But I don't understand networked storage well, so I am asking the HTCondor hivemind for their thoughts.
>> Matt,
>> 
>> You need to remember that HTCondor can work in many different
>> environments, among these being WANs, campus-type structures, and
>> grids.  For these types of scenarios, file transfer is preferable, if
>> not required.
>> 
>> -Nick
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.cs.wisc.edu%2Fmailman%2Flistinfo%2Fhtcondor-users&amp;data=05%7C01%7CM.T.West%40exeter.ac.uk%7C61d1edccc33040a29dbe08da847634a5%7C912a5d77fb984eeeaf321334d8f04a53%7C0%7C0%7C637967940467284512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=bOzAtZtd%2Bt77nebH4TL92OuTF3rUFeP1V07VffPefBM%3D&amp;reserved=0
>> 
>> The archives can be found at:
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.cs.wisc.edu%2Farchive%2Fhtcondor-users%2F&amp;data=05%7C01%7CM.T.West%40exeter.ac.uk%7C61d1edccc33040a29dbe08da847634a5%7C912a5d77fb984eeeaf321334d8f04a53%7C0%7C0%7C637967940467284512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=8PjtvGZ08sVw63fyn1FndsweVGrh9GdFH2r1Z2scEZI%3D&amp;reserved=0
> 
> -- 
> Matthew T. West
> DevOps & HPC SysAdmin
> University of Exeter, Research IT
> www.exeter.ac.uk/research/researchcomputing/support/researchit
> 57 Laver Building, North Park Road, Exeter, EX4 4QE, United Kingdom
> 
> Please note, I may send emails out of 'normal' working hours, as this fits my own work-life balance. I do not expect a response outside of your own working hours.
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/