[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Backfill on an OpenStack system



Hi Matt,

Here is a short description of a possible setup similar to what we use.

We have a service machine that runs the HTCondor negotiator, collector and scheduler. You can run these services also on a VM. We use the same machine to run COBalD. COBalD needs privileges to perform condor_status and condor_drain commands. The worker VMs themselves run only a HTCondor Master and StartD, which connects to the pool of the service machine.

I answer your questions inline. In case you have additional questions, we can also have a short chat.

Regards,

Matthias

On 10/3/21 00:11, M.T.West@xxxxxxxxxxxx wrote:
Hi Matthias,

As I am new to OpenStack, so how a bunch of the services and daemons work together is a bit confusing.
- Does COBalD daemon have to be running on every potential worker node?
ÂÂÂ No, you need to run COBalD only on the service machine
- Do the HTCondor daemons run on bare-metal or in a special VM configured for running HTCondor workloads?
ÂÂÂ On the workers, a HTCondor StartD is running inside the VM. There are no special HTCondor configurations for COBalD/TARDIS necessary. However, I would recommend you to configure an auto-shutdown for idle VMs. See https://htcondor.readthedocs.io/en/latest/cloud-computing/annex-customization-guide.html?highlight=DEFAULT_MASTER_SHUTDOWN_SCRIPT#image-requirements and use DEFAULT_MASTER_SHUTDOWN_SCRIPT and STARTD_NOCLAIM_SHUTDOWN macros.
- How are jobs wishing to run in containers handled?
ÂÂÂ I'm not sure what exactly you mean. We use the docker universe from HTCondor to run jobs in containers (https://htcondor.readthedocs.io/en/latest/users-manual/docker-universe-applications.html). The users can define which docker image should be used. Since HTCondor run inside a VM, docker works as on a bare-metal worker node.

While I don't yet understand the setup, that this software is running so well in production speaks well of it.

Cheers,
Matt


-----Original Message-----
From: HTCondor-users<htcondor-users-bounces@xxxxxxxxxxx>  On Behalf Ofmatthias.schnepf@xxxxxxx
Sent: 07 September 2021 04:28 PM
To:htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] Backfill on an OpenStack system

CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.


Hi all,

dropping this here since it's likely to be viable for this use-case.

At KIT (WLCG Tier1 and university Tier3) we developed COBalD/TARDIS [0] to integrate resources into an HTCondor pool from various providers [1]. There's a medium-sized list of backends we support, but most importantly we use OpenStack in production for a while now.

If you have any questions, just let me know - many of us also watch this list, but apparently we're not so fast in responding here...

Cheers,

Matthias

[0]https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcobald-tardis.readthedocs.io%2Fen%2Flatest%2F&amp;data=04%7C01%7CM.T.West%40exeter.ac.uk%7C504fdcd10b074f90338e08d972147f34%7C912a5d77fb984eeeaf321334d8f04a53%7C0%7C0%7C637666254626216730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=JcbakrIGgspjjMBwx5NM1cqajaIVh6KvASnnn7GeW%2Bc%3D&amp;reserved=0
[1]https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.epj-conferences.org%2Farticles%2Fepjconf%2Fabs%2F2020%2F21%2Fepjconf_chep2020_07038%2Fepjconf_chep2020_07038.html&amp;data=04%7C01%7CM.T.West%40exeter.ac.uk%7C504fdcd10b074f90338e08d972147f34%7C912a5d77fb984eeeaf321334d8f04a53%7C0%7C0%7C637666254626216730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=EZ3YCf1H%2B9n5U8JkvZPXra9JT9MpHJIVbg4RIBxj%2Ft8%3D&amp;reserved=0

On 05.09.21 11:38,jcaballero.hep@xxxxxxxxx  wrote:
Hi Matt,

The cloud team at RAL does what you are looking for. Asking in the TB
Support list may be helpful as well.

Cheers,
Jose



El sÃb, 4 sept 2021 a las 22:58, West, Matthew
(<M.T.West@xxxxxxxxxxxx>) escribiÃ:
Hi Tim,

I will chat with the GridPP folks this week if I can grab someone's attention as they just had their yearly project meeting.

One could just run the HTCondor startd on the bare machines and not fuss with trying to pack things into a VM, but I also wanted to standardize AMAP a setup for workstation pools as well. There are a bunch of systems all over campus that could be wrangled into use and I feel it might be an easier sell than asking for brand-new hardware for HTC.

Cheers,
Matt
________________________________
From: HTCondor-users<htcondor-users-bounces@xxxxxxxxxxx>  on behalf
of Steven C Timm<timm@xxxxxxxx>
Sent: Saturday, September 4, 2021 8:18 PM
To:htcondor-users@xxxxxxxxxxx  <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Backfill on an OpenStack system

CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.

There are 2 ways it can be done.  One is to install the optional EC2
openstack emulator and use the aws features of htcondor to launch virtual machines.
The other way is the so called "VAC" system in which there is a
daemon running on each cloud node to self-launch a VM that was
developed by GridPP in the UK.. basically the idea that the VM's
launch out of the "vacuum" and join a htcondor pool.  The latter can run on any pool, doesn't necessarily need openstack.

I am fairly new to running openstack myself so am not sure if they
have the equivalent of VM's that can be pre-empted. but if you have a
startd you could use htcondor to condor_off the startd if the VM is needed back and have the VM then programed to exit.

HTCondor at one point was going to add a feature to talk directly to
the OpenStack "Nova" API but I don't think that it is functional yet,.

Steve Timm

________________________________
From: HTCondor-users<htcondor-users-bounces@xxxxxxxxxxx>  on behalf
of West, Matthew<M.T.West@xxxxxxxxxxxx>
Sent: Saturday, September 4, 2021 1:20 PM
To:htcondor-users@xxxxxxxxxxx  <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Backfill on an OpenStack system

Hi All,

Here at Exeter, IT is setting up an OpenStack system to support researchers who want DRAM heavy bespoke workstation-like environments. Because I don't expect the system to be full up with active users 24/7, I am wondering what the optimal way to setup an HTCondor pool on it to run jobs as backfill. Would this be similar to how you would do it for any other spare resources: have a VM start up on a node and announce itself to the collector daemon as an available worker if idle conditions of the machine are met?

It reminds me of the method to expand one's resources into corporate cloud servers but I am not sure what tools are useful in this case.

Cheers,
Matt
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message tohtcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
ts.cs.wisc.edu%2Fmailman%2Flistinfo%2Fhtcondor-users&amp;data=04%7C01
%7CM.T.West%40exeter.ac.uk%7C504fdcd10b074f90338e08d972147f34%7C912a5
d77fb984eeeaf321334d8f04a53%7C0%7C0%7C637666254626226686%7CUnknown%7C
TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXV
CI6Mn0%3D%7C1000&amp;sdata=s6%2BgAN%2FlHr1vTCsFLlTBsC2ba5phSVgbsIjiWz
kQAzk%3D&amp;reserved=0

The archives can be found at:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
ts.cs.wisc.edu%2Farchive%2Fhtcondor-users%2F&amp;data=04%7C01%7CM.T.W
est%40exeter.ac.uk%7C504fdcd10b074f90338e08d972147f34%7C912a5d77fb984
eeeaf321334d8f04a53%7C0%7C0%7C637666254626226686%7CUnknown%7CTWFpbGZs
b3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
D%7C1000&amp;sdata=xSZwqymzr8RI7bV8AQGFzAD1fx%2F8YDGIzGtcNlBrLNw%3D&a
mp;reserved=0
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message tohtcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
s.cs.wisc.edu%2Fmailman%2Flistinfo%2Fhtcondor-users&amp;data=04%7C01%7
CM.T.West%40exeter.ac.uk%7C504fdcd10b074f90338e08d972147f34%7C912a5d77
fb984eeeaf321334d8f04a53%7C0%7C0%7C637666254626226686%7CUnknown%7CTWFp
bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
0%3D%7C1000&amp;sdata=s6%2BgAN%2FlHr1vTCsFLlTBsC2ba5phSVgbsIjiWzkQAzk%
3D&amp;reserved=0

The archives can be found at:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist
s.cs.wisc.edu%2Farchive%2Fhtcondor-users%2F&amp;data=04%7C01%7CM.T.Wes
t%40exeter.ac.uk%7C504fdcd10b074f90338e08d972147f34%7C912a5d77fb984eee
af321334d8f04a53%7C0%7C0%7C637666254626226686%7CUnknown%7CTWFpbGZsb3d8
eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1
000&amp;sdata=xSZwqymzr8RI7bV8AQGFzAD1fx%2F8YDGIzGtcNlBrLNw%3D&amp;res
erved=0
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message tohtcondor-users-request@xxxxxxxxxxx  with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.cs.wisc.edu%2Fmailman%2Flistinfo%2Fhtcondor-users&amp;data=04%7C01%7CM.T.West%40exeter.ac.uk%7C504fdcd10b074f90338e08d972147f34%7C912a5d77fb984eeeaf321334d8f04a53%7C0%7C0%7C637666254626226686%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=s6%2BgAN%2FlHr1vTCsFLlTBsC2ba5phSVgbsIjiWzkQAzk%3D&amp;reserved=0

The archives can be found at:
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.cs.wisc.edu%2Farchive%2Fhtcondor-users%2F&amp;data=04%7C01%7CM.T.West%40exeter.ac.uk%7C504fdcd10b074f90338e08d972147f34%7C912a5d77fb984eeeaf321334d8f04a53%7C0%7C0%7C637666254626226686%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=xSZwqymzr8RI7bV8AQGFzAD1fx%2F8YDGIzGtcNlBrLNw%3D&amp;reserved=0

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message tohtcondor-users-request@xxxxxxxxxxx  with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature