[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor 9.1



Hi John,

Thanks for the help.Â

1/ NETWORK_INTERFACE is the same on all machines

lyle@tuna$ condor_config_val -v NETWORK_INTERFACE
NETWORK_INTERFACE = *
Â# at: <Default>
Â# raw: NETWORK_INTERFACE = *


FYI my /etc/hosts on all machines follows a standard layout, ie for @tuna

lyle@tuna$ cat /etc/hosts
127.0.0.1    localhost        tuna
127.0.1.1    tuna.ocwen.com   Âtuna


all machines have a /etc/hostname file containing their "hostname" but domainnameÂis blank.Â

2/ UID_DOMAIN is also similar on all machines, that is default ofÂ

lyle@grenadier:$ condor_config_val -v UID_DOMAIN
UID_DOMAIN = localhost
Â# at: <Default>
Â# raw: UID_DOMAIN = $(FULL_HOSTNAME)


... What I tried
It looked to me that condor is not picking up the actual hostname and perhaps this is because we have no domainname configured.Â

lyle@grenadier:/etc/condor/config.d$ hostname
grenadier

lyle@grenadier:/etc/condor/config.d$ condor_config_val -v HOSTNAME
HOSTNAME = localhost
Â# at: <Detected>
Â# raw: HOSTNAME = localhost

lyle@grenadier:/etc/condor/config.d$ condor_config_val -v FULL_HOSTNAME
FULL_HOSTNAME = localhost
Â# at: <Detected>
Â# raw: FULL_HOSTNAME = localhost

* I tried pointingÂNETWORK_INTERFACE to 127.0.1.1 on all machines and also to the CENTRAL MANAGER ip (something i read) but this did not change what condor picks up as the hostname.Â
* I tried setting the UID_DOMAIN=ocwen.comÂon all machinesÂbut this did not work (everything still runs as nobody) and i suspect this is because the hostname is not picked up correctly as well

Thanks, Lyle


On Wed, Jul 28, 2021 at 1:59 AM John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
I think slots are appearing as localhost because your condor_config is telling condor to use localhost as the primary network interface.ÂÂ

What does the condor_config have set forÂNETWORK_INTERFACE ?

Try running

 Âcondor_config_val -v NETWORK_INTERFACE

By the way, you can see all of your configuration that differs from the default HTCondor configuration by running

  condor_config_val -summary

When a job runs, files will be written as nobody if the job runs as nobody, which happens when HTCondor does not think that the submit node and the execute node have the same set of user ids. It decides this by comparing the value of UID_DOMAIN on both of these machines.Â

Try running

  condor_config_val -v UID_DOMAIN

on both the submit machine and the execute machine, what is the value?

Now having files writting as nobody on the execute node is not a problem when HTCondor is doing file transfer, because it will change ownership of the files as it transfers the results back. but if you are using a shared file system
you may need to do some additional configuration.Â

Instructions for setting up HTCondor to use shared files system is here

Configuration Macros â HTCondor Manual 9.1.0 documentation


-tj



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Lyle Pakula <Lyle@xxxxxxxxxxxxxxxx>
Sent: Monday, July 26, 2021 7:14 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] HTCondor 9.1
Â
Hi Everyone and thanks for everyone's help in advance!

We have recently upgraded from a very old install of 7.6 to 9.1 on ubuntu 18.04 by basically blowing away everything old (uninstall, remove systemctl, delete "condor user" from all machines) and then followingÂhttps://htcondor.readthedocs.io/en/latest/getting-htcondor/admin-quick-start.html.

* Starting with a basic setup (3 Machines, 3 roles)Â+ NAS mounted on all machines.Â
* Vanilla universe Jobs read/write to and from the NASÂ

Question 1 - Why are slots apearing as "localhost" and not the machineÂname they are actually on?
lyle@tuna:~$ condor_status
Name      ÂOpSys   ÂArch  State   Activity LoadAv Mem  ActvtyTime

slot1@localhost LINUX   ÂX86_64 Unclaimed Idle   Â0.000 1990 Â0+00:30:39
slot2@localhost LINUX   ÂX86_64 Unclaimed Idle   Â0.000 1990 Â0+00:30:36
slot3@localhost LINUX   ÂX86_64 Unclaimed Idle   Â0.000 1990 Â0+00:30:33
slot4@localhost LINUX   ÂX86_64 Unclaimed Idle   Â0.000 1990 Â0+00:30:32
slot5@localhost LINUX   ÂX86_64 Unclaimed Idle   Â0.000 1990 Â0+00:30:31
slot6@localhost LINUX   ÂX86_64 Unclaimed Idle   Â0.000 1990 Â0+00:30:42
slot7@localhost LINUX   ÂX86_64 Unclaimed Idle   Â0.000 1990 Â0+00:30:41
slot8@localhost LINUX   ÂX86_64 Unclaimed Idle   Â0.000 1990 Â0+00:30:41

Question 2 - Files are written as nobody:nouser, how can we change this?Â
Problem here is that the written files are unreadable/unwriteable to the submitterÂ

Tried this but did not workÂ

Thanks, Lyle

--
AE CAPITAL
Ground Floor, 555 Bourke Street, Melbourne AustraliaÂ3000

p +61 3 9020 7801
m +61 (0)434 872 054
w http://www.aecapital.com.au


AE Capital Pty Limited (ACN 153 242 865) is regulated by the Australian Securities & Investments Commission and is a Corporate Authorised Representative of JFM Pty Limited (ACN 125 150 656), holder of an Australian Financial Services Licence (AFSL 314585). AE Capital Pty Limited is a member of the National Futures Association (ID 0498660).
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


--
AE CAPITAL
Ground Floor, 555 Bourke Street, Melbourne AustraliaÂ3000

p +61 3 9020 7801
m +61 (0)434 872 054
w http://www.aecapital.com.au


AE Capital Pty Limited (ACN 153 242 865) is regulated by the Australian Securities & Investments Commission and is a Corporate Authorised Representative of JFM Pty Limited (ACN 125 150 656), holder of an Australian Financial Services Licence (AFSL 314585). AE Capital Pty Limited is a member of the National Futures Association (ID 0498660).