[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor removed after Ubuntu updates



Hello Todd,

 

Thanks for the reply.

 

Further details that in looking more deeply might be relevant â the machines / VMs are all domain joined. In the beginning of my testing, having domain-joined machines caused the getcondor script to fail as it couldnât create the âcondorâ user on the domain, so I solved / avoided that making sure the domain-join step occurred after condor was installed. This worked fine, so I didnât think much more about it, but that could possibly be the issue thatâs happening here.

 

Rerunning the getcondor script on one of the broken VMs gives the error that condor is half-installed:

Error: HTCondor appears to have been installed previously on this system.

 

You may update the existing install, or remove and then re-install.

To update: echo "apt-get update && apt-get upgrade minihtcondor"

To remove: echo "apt-get -y remove --purge minihtcondor && apt-get -y autoremove --purge && rm -fr /etc/condor"

 

I found that I could fix these machines pretty simply, but just doing the following:

 

apt install htcondor

systemctl enable condor

systemctl start condor

 

After that, the machines were re-added to the pool:

 

root@gntcs-exectest:/home/nairland-ou# condor_status

[ skipping a lot of text from the 256thread server]

slot256@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle          0.000 4030  0+01:26:12

gntcs-exectest.genetics.wisc.edu          LINUX      X86_64 Unclaimed Benchmarking  0.000 3882  0+00:00:00

gntcs-pxetest.genetics.wisc.edu           LINUX      X86_64 Unclaimed Idle          0.000 3877  0+00:20:00

 

gntcs-exectest is the machine Iâm doing this on right now, a VM to play with.

 

As for what Ansible is doing, hereâs the ansible code for the updating:

 

tasks:

    - name: Update Apt repo

      apt: update_cache=yes

    - name: Upgrade packages

      apt: upgrade=dist

 

Basically, just running âapt-get update; apt-get dist-upgradeâ on the machines.

 

Hopefully that helps answer any questions. One thing I can always do is version-lock condor until Iâve had time to test it out on one of the VMs in the future to make sure there arenât weird issue, but I wanted to see if this was something not expected, as it certainly surprised me.

 

Thanks again!

 

-Nils

 

From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Sent: Wednesday, January 19, 2022 1:09 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Nils Irland <nirland@xxxxxxxxxxxxxxxxx>
Subject: Re: [HTCondor-users] Condor removed after Ubuntu updates

 

On 1/14/2022 2:44 PM, Nils Irland via HTCondor-users wrote:

Hello all,

 

Iâm a condor neophyte and have been playing around with some test setups to get the hang of it. Iâm running condor on Ubuntu 20.04.3 LTS and my simply environment consisted of a submit / central manager node and 3 execute nodes, all installed via the getcondor script method.

 

Today I ran updates on those machines to install the latest patches and condor disappeared from all of them. Looking at dpkg.log in /var/log I can see the notes of it getting uninstalled, but I have no idea why. Is this something else anyone else has run into? Obviously having this happen in production would be a bad thingâ so Iâd like to know what exactly happened but Iâm not sure where to start.


Hi Nils,

This is indeed strange and unexpected, thank you for reporting your experience!  From the dpkg.log below, the only item that looks perhaps non-benign is the upgrade of libclassad, which is a HTCondor suite component and has dependencies on the rest of HTCondor. 

I tried to recreate this, but failed to do so.  Specifically I tried:
   1. Using gethtcondor to install the latest minihtcondor (v9.5.0) into a Ubuntu 20.03 container, then while HTCondor was running I did "apt-get update; apt-get upgrade".  Ubuntu updated to v20.04, and HTCondor kept running along just fine.
   2. Next I tried the same as #1, but this time I installed minihtcondor v9.4 (specifically, apt-get install minihtcondor=9.4.0-1.1) in an attempt to match the version of HTCondor you were using.  No problems after updating this time either.

Later this week (~friday) folks here at HTCondor that deal with packaging on Ubuntu (specifically Tim T) will also try to re-create your issue on Ubuntu 20 - perhaps he will have better luck than I did.

It would certainly help to know how your machines were updating, ie exactly what Ansible is doing. 

I am no Ubuntu packaging expert, but perhaps there is some incompatibility between using apt-get (which is used by gethtcondor), and some dpkg method being used by Ansible to update?  Do apt and dpkg always play nicely together?

In the meantime, have all the binaries been removed?  I.e. what happens doing "ls /usr/bin/*condor*" and/or "ls /usr/sbin/*condor*" ? On a machine where you ran gethtcondor, are you able to get the HTCondor binaries back again by re-running gethtcondor ?

regards,
Todd



If anyone else has had this happen or knows a good place to look for artifacts of the uninstall, that would be great. I used Ansible to run the updates, so I donât have the full command output of apt as it was run, just a list of packages updated, which looks fairly benign:

 

ok: [compute01.genetics.wisc.edu] => {

    "result.stdout_lines": [

        "upgrade python3-pil:amd64 7.0.0-4ubuntu0.4",

        "upgrade firefox:amd64 95.0.1+build2-0ubuntu0.20.04.1",

        "upgrade ubuntu-advantage-tools:amd64 27.4.2~20.04.1",

        "upgrade linux-firmware:all 1.187.24",

        "upgrade libclassad15:amd64 9.4.0-1.1"

    ]

}

ok: [pxetest.genetics.wisc.edu] => {

    "result.stdout_lines": [

        "upgrade firefox:amd64 95.0.1+build2-0ubuntu0.20.04.1",

        "upgrade ubuntu-advantage-tools:amd64 27.4.2~20.04.1",

        "upgrade linux-firmware:all 1.187.24",

        "upgrade libclassad15:amd64 9.4.0-1.1"

    ]

}

ok: [submit.genetics.wisc.edu] => {

    "result.stdout_lines": [

        "upgrade libnss-systemd:amd64 245.4-4ubuntu3.14",

        "upgrade systemd-timesyncd:amd64 245.4-4ubuntu3.14",

        "upgrade systemd-sysv:amd64 245.4-4ubuntu3.14",

        "upgrade libpam-systemd:amd64 245.4-4ubuntu3.14",

        "upgrade systemd:amd64 245.4-4ubuntu3.14",

        "upgrade libsystemd0:amd64 245.4-4ubuntu3.14",

        "upgrade udev:amd64 245.4-4ubuntu3.14",

        "upgrade libudev1:amd64 245.4-4ubuntu3.14",

        "upgrade ubuntu-advantage-tools:amd64 27.4.2~20.04.1",

        "upgrade linux-firmware:all 1.187.24",

        "upgrade libclassad15:amd64 9.4.0-1.1"

    ]

}

ok: [exectest.genetics.wisc.edu] => {

    "result.stdout_lines": [

        "upgrade ubuntu-advantage-tools:amd64 27.4.2~20.04.1",

        "upgrade firefox:amd64 95.0.1+build2-0ubuntu0.20.04.1",

        "upgrade linux-firmware:all 1.187.24",

        "upgrade libclassad15:amd64 9.4.0-1.1"

    ]

}

 

Thanks very much, and if anyone has any ideas or would like more information, please let me know!

 

-Nils

 

Nils Irland

IT Manager, Laboratory of Genetics, UW-Madison

608-263-9898

nirland@xxxxxxxxxxxxxxxxx

Working remotely Tuesday and Thursday

 



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
 
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/




-- 
Todd Tannenbaum <tannenba@xxxxxxxxxxx>  University of Wisconsin-Madison
Center for High Throughput Computing    Department of Computer Sciences
Calendar: https://tinyurl.com/yd55mtgd  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                   Madison, WI 53706-1685