Guidance on next steps after last week’s HPC maintenance


Date: Fri, 20 Oct 2023 20:13:55 +0000
From: chtc-users@xxxxxxxxxxx
Subject: Guidance on next steps after last week’s HPC maintenance

Hello,


This message is for users of our HPC cluster (accessed through hpclogin3.chtc.wisc.edu). 


We are providing additional guidance on the changes to the HPC system arising from last week’s maintenance. 


For context, the maintenance was required to address a security flaw in PMIx version 4.1.3, which was accomplished by upgrading to version 4.2.6. Since PMIx is used by MPI and Slurm, we had to recompile many of the system modules with the new version of PMIx. This rebuilding process also triggered a rebuild of modules *not* dependent on MPI, changing their installation locations. This has two implications for users: 


  1. Software built with *any* CHTC-provided modules will likely need to be recompiled.

  2. Software that uses MPI will need to be recompiled. 


At the end of this message is a summary of the changes, potential issues, and guidance on how to address them. These instructions can also be accessed here


We will be focusing on HPC issues at our next office hours on Tuesday, Oct. 24, from 10:30 AM - 12:00 PM. You can also email us at chtc@xxxxxxxxxxx.


Regards,

The CHTC Facilitation Team


----------------------


(1) Module versions

Some modules had their versions upgraded. The modules and their new versions are listed here:

cmake/3.27.7-gcc-11.3.0

hdf5/1.12.2-gcc-11.3.0-intel-oneapi-mpi-2021.10.0

intel-oneapi-compilers/2023.2.1-gcc-11.3.0

intel-oneapi-mkl/2023.2.0-gcc-11.3.0

intel-oneapi-mpi/2021.10.0-gcc-11.3.0

intel-tbb/2021.9.0-gcc-11.3.0

mvapich2/2.3.7-1-gcc-11.3.0

patchelf/0.18.0-intel-2021.10.0

If you explicitly provide version numbers when loading your modules, you may encounter a “module(s) are unknown” error. You will need to update your commands to reference the new versions of the modules and recompile code dependent on these modules.


(2) Module locations and library paths

The previous software stack was located in /software/chtc/spack/. The new software stack is located in /software/chtc/newspack/. As such, attempting to run code that was compiled prior to last week’s maintenance may generate “Permission denied” or “No such file or directory” errors, or similar library/program not found errors. You will need to recompile your code. 

  • Some programs such as CMake may keep a record of paths to required dependencies at the time of the first compilation, and use that record for future compilations instead of relocating the dependencies. For example, CMake may have generated the file CMakeCache.txt with paths to the old location and reuse that file when compiling your code. You will need to remove such files or else you may continue to experience the above errors. 

  • If you explicitly provided paths to library/module locations as part of the compile or runtime configuration, you will need to update those paths to account for the new location.

A fresh installation of your software should avoid the above issues. If you haven’t already documented the installation process for your research software, now is a good time to do so.


Note: Paths to programs/modules in the stack include a hash from the installation process. This hash may have changed, so a simple substitution of /software/chtc/newspack for /software/chtc/spack will likely be insufficient.


(3) Spack configuration

If you used Spack to install your software and/or dependencies per the guides on our website, then you will need to update your configuration files and likely recreate your environments. First, copy the new configuration files to the location of your current configuration files using

cp -R /software/chtc/newspack/chtc-user-config/* ~/.spack

Then reinstall any packages that are dependent on MPI or the location of the system packages/modules. 

  1. Remove old environments and then uninstall their corresponding packages. (If you attempt to uninstall a system package, you will get the error “does not match any installed packages”.)

  2. Repeat your installation process in a new environment.

If you still encounter issues with Spack or software installed using Spack, remove Spack by deleting the directories ~/.spack, ~/spack, and ~/spack_programs. Then do a fresh installation following the instructions in our guides.


(The above instructions assume you followed the default “Individual Use” instructions. If your Spack installation is in a group directory or other non-standard location, replace ~/ with the proper path.)


Note: Removing the old environment will not remove the packages that had been installed when creating the environment. If you aren’t sure which packages you need to remove, then doing a fresh installation is probably the best option.

----------------------
[← Prev in Thread] Current Thread [Next in Thread→]
  • Guidance on next steps after last week’s HPC maintenance, chtc-users <=