[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_config location for windows



I think it was billionaire Warren Buffett who said "Put all your eggs in one basket... and WATCH THAT BASKET." That is to say, if the NFS server or the network to reach it go down, we have much bigger problems than an incorrect HTCondor startup. So we make sure that the fileserver and the network stay up, and with local_config_dir in /home/condor/config/config.d it works fine except when a system is booted while something's off the rails.

In addition, I'm using an high-availability config on the scheduler, so the job spool is on an NFS filesystem as well - /home/condor/share/spool - so that's an extra egg in that basket.

However, there's one other situation where this problem can crop up on newer OS releases - the systemd aggressively parallelizes the system startup, and on machines with fast-enough disk drives it will routinely start the NFS client and HTCondor at essentially the same time, and without the setting to require a local config, HTCondor starts up standalone without the NFS configs. And with the setting it fails to start at all.

The solution to this is to add a /usr/lib/systemd/system/condor.service.d/nfs_online.conf file containing:

[Unit]
After = nfs-client.target autofs.service

This insures that HTCondor only launches after NFS and the automounter are up and running, if applicable. I sent a ticket to CHTC to suggest that this be added to the default systemd unit file to help prevent this sort of problem with HA schedds and NFS-served configuration files.

	-Michael Pelletier.


-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of pascal ebay
Sent: Wednesday, April 11, 2018 3:51 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [External] Re: [HTCondor-users] condor_config location for windows

Hi Michael,

Thanks for your answer. Could you explain the rationale behind it? If I understand the documentation correctly, the problem with placing the config_file (or the binaries themselves) on a shared folder is that of their accessibility during service startup.

But it seems to me that your solution (pointing to a shared condor_config.local file from within a condor_config.local sitting on the local disk) is not an answer to this problem?

So, what is the problem that you solved with this set-up and what is your experience with the accessibility problem mentionned in the docs?

Thanks

Pascal


On 4/10/18, Michael Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx> wrote:
> Hi Pascal,
>
> Iâve been using this approach for years â the trick here is to leave 
> the condor_config file exactly as is, and then in the 
> condor_config.local you set the LOCAL_CONFIG_DIR & LOCAL_CONFIG_FILE 
> parameters to point to the shared config directory, something along these lines:
>
> SHARED_CONFIG_DIR = /user/condor
> LOCAL_CONFIG_FILE = $(SHARED_CONFIG_DIR)/config/host.d/$(HOSTNAME)
> LOCAL_CONFIG_DIR = $(SHARED_CONFIG_DIR)/config/config.d
>
> Regards,
>
>                 -Michael Pelletier.
>
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On 
> Behalf Of pascal ebay
> Sent: Tuesday, April 10, 2018 5:44 PM
> To: htcondor-users@xxxxxxxxxxx
> Subject: [External] [HTCondor-users] condor_config location for 
> windows
>
> I am reading the doc to set up an installation mixing Windows and 
> Linux nodes. Section 3.14.3 presents a nice way of minimizing 
> duplication of the configuration by hosting it on a shared folder and 
> using $(OPSYS) value in filenames.
>
> However in 3.2.3.5, it says:
>
> CONDOR_CONFIG should point to the condor_config file. In this version 
> of HTCondor, it must reside on the local disk.
>
> This piece of information seems contradictory with the setup above.
>
> So, should condor_config be shared in a network drive for Windows node 
> after all?
>
> Thanks
>
> Pascal
>
>

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/