[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor file-transfer vs networked storage



Good evening

While the thread has forked a bit, I just wanted to say thank you all for a helpful overview of the considerations one should think through when using file-transfer. This list, however much I dislike email lists in the abstract, continues to be the most informative and welcoming computing forum I am a member of.

Cheers,
Matt

On 22/08/2022 22:47, Bockelman, Brian wrote:
CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.


Hi Matt,

I guess I wouldn't emphasize performance and throughput but rather reliability and predictability.

That is, when you use HTCondor file transfer's mechanism (even if it's just a plugin that stages things from the shared filesystem),

1) You know that condor won't start the job until the files are to a local disk, decreasing the likelihood a transient filesystem issue will fail out the job when it's 99% complete.
2) When a job fails, know that condor can tell you if the data staging was the underlying problem and provide a policy that's executed in such a situation.

Some users may value (1) much more highly than performance.

The reverse is also true -- some users might need absolute performance and run so few jobs that reliability is not relevant.  It's all about tradeoffs and value systems in the end...

Brian

On Aug 22, 2022, at 3:35 PM, Matthew T West via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:

Good evening Nick,

I am readily aware of the value of HTCondor's file-transfer mechanism and associated sandboxing. But that wasn't my issue.

My question was:

When working on a single homogeneous compute cluster, are there any advantages to using HTCondor's file-transfer rather than working off shared network storage?

So not a grid or distributed campus pool or pulling from remote storage, but a single homogeneous compute cluster in one location that includes a networked file-system. I do apologize if everything after the question in my original email confused matters. Here might be a better way to put it:

Under what conditions does a shared file-server's degrade such that it would be better to work from local scratch, performance and throughput wise?


Regards,
Matt

On 22/08/2022 20:37, Nick LeRoy wrote:
CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.


On Sat, Aug 20, 2022 at 8:44 AM Matthew T West via HTCondor-users
<htcondor-users@xxxxxxxxxxx> wrote:
Hi All,

When working on a single homogeneous compute cluster, are there any advantages to using HTCondor's file-transfer rather than working off shared network storage? I guess it would depend on the network and storage speeds.

It's just interesting that the "always work in local scratch" mindset I am used to is seen a serious backward step performance wise:

Scratch therefore only useful if your network storage or interconnects are slow or saturated ... copying bulk data to local storage / getting all users to copy to local/scratch storage is a quick way to saturate your storage infrastructure.

I can find other instances of this HPC conventional wisdom and it intuitively makes sense. But I don't understand networked storage well, so I am asking the HTCondor hivemind for their thoughts.
Matt,

You need to remember that HTCondor can work in many different
environments, among these being WANs, campus-type structures, and
grids.  For these types of scenarios, file transfer is preferable, if
not required.

-Nick
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.cs.wisc.edu%2Fmailman%2Flistinfo%2Fhtcondor-users&amp;data=05%7C01%7CM.T.West%40exeter.ac.uk%7Ce7edb0a0be534a5df7ad08da8488004e%7C912a5d77fb984eeeaf321334d8f04a53%7C0%7C0%7C637968016902595261%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=roBOYQpuCiMdTSTg6HivSNXMiBwQ8%2FCHWbi8MqnyhD4%3D&amp;reserved=0

The archives can be found at:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.cs.wisc.edu%2Farchive%2Fhtcondor-users%2F&amp;data=05%7C01%7CM.T.West%40exeter.ac.uk%7Ce7edb0a0be534a5df7ad08da8488004e%7C912a5d77fb984eeeaf321334d8f04a53%7C0%7C0%7C637968016902595261%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=otLsTVtcLQ0zdfErQqHLpEWI1uc3at5lC%2BB1akH3Ymw%3D&amp;reserved=0
--
Matthew T. West
DevOps & HPC SysAdmin
University of Exeter, Research IT
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.exeter.ac.uk%2Fresearch%2Fresearchcomputing%2Fsupport%2Fresearchit&amp;data=05%7C01%7CM.T.West%40exeter.ac.uk%7Ce7edb0a0be534a5df7ad08da8488004e%7C912a5d77fb984eeeaf321334d8f04a53%7C0%7C0%7C637968016902595261%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=CccGrvg11QsRsKWZZmm0MjLcUzWkiZhmRRbYVme5MMI%3D&amp;reserved=0
57 Laver Building, North Park Road, Exeter, EX4 4QE, United Kingdom

Please note, I may send emails out of 'normal' working hours, as this fits my own work-life balance. I do not expect a response outside of your own working hours.

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.cs.wisc.edu%2Fmailman%2Flistinfo%2Fhtcondor-users&amp;data=05%7C01%7CM.T.West%40exeter.ac.uk%7Ce7edb0a0be534a5df7ad08da8488004e%7C912a5d77fb984eeeaf321334d8f04a53%7C0%7C0%7C637968016902595261%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=roBOYQpuCiMdTSTg6HivSNXMiBwQ8%2FCHWbi8MqnyhD4%3D&amp;reserved=0

The archives can be found at:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.cs.wisc.edu%2Farchive%2Fhtcondor-users%2F&amp;data=05%7C01%7CM.T.West%40exeter.ac.uk%7Ce7edb0a0be534a5df7ad08da8488004e%7C912a5d77fb984eeeaf321334d8f04a53%7C0%7C0%7C637968016902595261%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=otLsTVtcLQ0zdfErQqHLpEWI1uc3at5lC%2BB1akH3Ymw%3D&amp;reserved=0

--
Matthew T. West
DevOps & HPC SysAdmin
University of Exeter, Research IT
www.exeter.ac.uk/research/researchcomputing/support/researchit
57 Laver Building, North Park Road, Exeter, EX4 4QE, United Kingdom

Please note, I may send emails out of 'normal' working hours, as this fits my own work-life balance. I do not expect a response outside of your own working hours.