[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] HTCondor on Amazon Linux



Hello HTCondor community!

First time poster: I work with the US Geological Survey with Mike Fienen, and we are working on ways to deploy HTCondor clusters in AWS. I believe there is an issue with the get_htcondor script that’s referenced as the best way to install HTCondor on Linux machines: https://htcondor.readthedocs.io/en/latest/getting-htcondor/install-linux-as-root.html. Now, I’ll also say that we are learning and either a) I could be wrong or b) it could be there is a more ideal method for installing HTCondor for our use case. In summary:

So here’s what I tried and led me to the above conclusion: we’re running HTCondor in AWS, and so most recently we’ve been working with Amazon Linux 2 for our HTCondor clusters. I noticed the get_condor script started failing to run on Amazon Linux 2 because it appears that script tries to install HTCondor v23

"+ sh -c 'yum install -y https://research.cs.wisc.edu/htcondor/repo/23.0/htcondor-release-current.amzn2.noarch.rpm || yum reinstall -y https://research.cs.wisc.edu/htcondor/repo/23.0/htcondor-release-current.amzn2.noarch.rpm'", "Cannot open: https://research.cs.wisc.edu/htcondor/repo/23.0/htcondor-release-current.amzn2.noarch.rpm. Skipping.", "Error: Nothing to do", "Cannot open file: https://research.cs.wisc.edu/htcondor/repo/23.0/htcondor-release-current.amzn2.noarch.rpm. Skipping.", "Error: Nothing to do"

Looking at the location of the .rpm files, I assumed you dropped Amazon Linux 2 support for HTCondor v23 since there is not AL2-specific RPM file here: https://research.cs.wisc.edu/htcondor/repo/23.0/ like there is for v10: https://research.cs.wisc.edu/htcondor/repo/10/10.0/amzn2/

So, we figured we’d try migrating to AL 2023 to stay current, but we noticed that running the get_condor script also fails on AL 2023 because it requires installing EPEL: https://github.com/htcondor/htcondor/blob/master/src/condor_scripts/get_htcondor#L460 BUT sadly EPEL has been dropped form AL 2023 and is not compatible. So, this just fails the get_condor script.

So, I’ve made some progress on AL2023 by abandoning the get_condor script and installing by means of manually running curl to obtain the RPM file at https://research.cs.wisc.edu/htcondor/repo/23.0/amzn2023/x86_64/release/condor-23.0.2-1.amzn2023.x86_64.rpm and then running “yum install condor”. I’m thinking this should be fine since we are trying to do as much configuration as possible for the Controller and Worker nodes as possible using static files in /etc/condor.d/config (I think that is the path…), however I do believe we did pass some environment variables to the get_condor script when things were working previously.

 

So, to recap, the purpose of reaching out is two-fold. First, we wanted to let you know about that bug in the get_condor script relating to AL2/AL2023/HTCondor v10/HTCondor v23 but also 2). Please let me know if you’d like any other logs or have follow-up questions.

 

Second, I’d like to ask if you have a recommendation for the most robust method of installing HTCondor. We liked using the get_condor script because we didn’t have to maintain it and it was able to perform some of the configuration steps at the time of installation, but it does seem like it might be a better practice to download the RPM file from a hardcoded URL and then force ourselves to perform all configuration from the static config files at /etc/condor.d/config. Let me know if you have any thoughts! I appreciate the help!

 

 

Brendan Wakefield (he/his)

USGS – Cloud Hosting Solutions

DevOps Team

bwakefield@xxxxxxxx