[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Windows dedicated run account profile corrupted



Thanks for the feedback, TJ.

We've checked that possibility; our observation is that a Windows restart will not automatically clean up the corrupted directory and registry entries.  In a cluster with ~150 machines and ~250 static slots, we've found some orphaned entries that are months old and have survived multiple maintenance reboots.

I wonder if you could share one clarification about the related user manual entry for Windows support:
"This may be useful if the job requires direct access to the user’s registry entries. It also may be useful when the job requires an application, and the application requires registry access. This feature is always enabled on the condor_startd, but it is limited to the dedicated run account."  

Does the above mean that when we want to rely on the dedicated run account, the submit configuration knob "load_profile = True" is redundant?

Thanks,
Mark


From: John M Knoeller <johnkn@xxxxxxxxxxx>
Sent: Wednesday, August 2, 2023 5:15 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: O'NEAL Mark <mark.oneal@xxxxxxxxxxx>
Subject: RE: Windows dedicated run account profile corrupted
 
This email is not from Hexagon’s Office 365 instance. Please be careful while clicking links, opening attachments, or replying to this email.

This is not a known HTCondor issue. 

 

I wonder if restarting Windows could clean up the user directories and registries that had been left behind?

 

-tj

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of O'NEAL Mark via HTCondor-users
Sent: Tuesday, August 1, 2023 7:52 PM
To: htcondor-users@xxxxxxxxxxx
Cc: O'NEAL Mark <mark.oneal@xxxxxxxxxxx>
Subject: [HTCondor-users] Windows dedicated run account profile corrupted

 

Hello,

 

We operate an HTCondor cluster under Windows utilizing the "load_profile = True" submit configuration macro and rely on the dedicated run accounts provisioned by the condor_startd running as Windows SYSTEM user.  Compute nodes running startd are a mix of Windows 8 and 10 running HTCondor 8.8.10, and are configured with static slot definitions.

 

Our IT manager recently noted that the dedicated run account profile cleanup which normally happens during job shutdown has been disrupted at some point in time on a number of these nodes, evidenced by:

  • profile folder in C:\Users (i.e. C:\Users\condor-slot1) is not deleted and appears corrupted.  Windows behavior kicks in next time the startd tries to create the dedicated run account, generating C:\Users\condor-slot1.hostname as a fallback
  • registry hive for the user condor-slot1 is not deleted

I've checked the StarterLog for a number of the slots, most show success to load the registry hive even when the issue described above is observed for that slot.  There were some which did report failure loading the registry hive in the Starter log.

 

I've done some research on the open web and haven't identified any hints where to look thus far.  I would appreciate if any one on the mailing list has suggestions where to start with log investigation or configuration setting.  We run the cluster for LAN use only behind our firewall, so have not seen a significant motivation to upgrade into the 9.x or 10.x releases.  If this were a known issue with older versions it would be a reasonable motivation to take the upgrade plunge though.

 

Best Regards,
Mark