[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Nodes missing in condor_status list



Hi,

 

I’ve set COLLECTOR_DEBUG to D_SECURITY but /var/log/condor/CollectorLog doesn’t  contain any PERMISSION DENIED lines.

 

There is an entry in the collector log though saying that my external/invisible node is granted READ level access (which explains why I can see my Condor pool on this node even though the not itself is not being listed):

 

09/18/14 10:47:57 PERMISSION GRANTED to unauthenticated@unmapped from host <IP> for command 5 (QUERY_STARTD_ADS), access level READ: reason: READ authorization policy allows IP address <IP>; identifiers used for this remote host <IP>, <HOSTNAME>

 

But I can’t find a corresponding line for WRITE level access. Actually, there is no such line even for the other node which do show up in the condor_status list… So how do I verify that a given node has been given WRITE access by the collector?

 

What could be the reason for this behavior?

 

I don’t know if this is important but I’m using the old Condor security concept by defining HOSTALLOW_WRITE. IT department has assured me that the required ports (9614/9618 = shared/collector) are open. “nc -zv IP 9614” returns “… open” (collector -> invisible node).

 

Best regards,

Lukas

 

 

Von: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] Im Auftrag von Brian Bockelman
Gesendet:
Dienstag, 9. September 2014 15:07
An: HTCondor-Users Mail List
Betreff: Re: [HTCondor-users] Nodes missing in condor_status list

 

 

On Sep 9, 2014, at 7:49 AM, Lukas Koschmieder <Lukas.Koschmieder@xxxxxxxxxxxxxxxxxxx> wrote:



Hi,

 

Some of my STARTD/SCHEED nodes don’t show up in the condor_status list.

This probably has something to do with the fact that these nodes belong to a different network.

1) Do I have to use the flocking mechanism in order add such an “external node” (see setup below)?

2) If I do not have to use the flocking mechanism then how do I track down the error? I’ve already checked all the logs (on both the invisible nodes as well as on the collector) but I can’t find anything clue.

 

 

This is how my pool is set up:

 

FOO network:

 

Collector/Negotiator:

condor.FOO.my.com (Debian 6)

 

“Internal” Startd/Scheed nodes:

start1.FOO.my.com (CentOS 6)  <- LISTED

start2.FOO.my.com (Windows 7) <- LISTED

 

BAR network:

 

“External” Startd/Scheed node:

start3.BAR.my.com (OpenSuse 13) <- NOT LISTED

start4.BAR.my.com (Windows 7)   <- NOT LISTED

 

 

 

Collector/Negotiator condor_config.local:

 

CONDOR_HOST = condor.FOO.my.com

DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR

ALLOW_WRITE = *.FOO.my.com, *.BAR.my.com

 

SGE_GAHP      = $(GLITE_LOCATION)/bin/batch_gahp

GLIDEIN_SITES = *.FOO.my.com

 

HOSTALLOW_WRITE = $(HOSTALLOW_WRITE), $(GLIDEIN_SITES)

 

Since HOSTALLOW_WRITE, if used, overrides ALLOW_WRITE, this is the relevant line to look at.

 

I think you want:

 

HOSTALLOW_WRITE = $(HOSTALLOW_WRITE), $(GLIDEIN_SITES), *.BAR.my.com

 

?

 

If it's a simple copy/paste error, a few other ideas:

 

- Look at /var/log/condor/CollectorLog and look for PERMISSION DENIED lines.  They often give a good hint as to what went wrong.

- Restart the collector with D_SECURITY set for COLLECTOR_DEBUG.  This will greatly increase the verbosity but also give more hints as to what is gone wrong.

 

Finally, this is not a particularly secure setup - host-based security may not be secure as you would want, especially as you start to involve multiple networks.  I can't find any good links at the moment, but maybe others could chime in?

 

Brian



 

USE_SHARED_PORT  = TRUE

SHARED_PORT_ARGS = -p 9614

DAEMON_LIST      = $(DAEMON_LIST), SHARED_PORT

 

 

 

Startd/Scheed condor_config.local:

 

CONDOR_HOST = condor.FOO.my.com

DAEMON_LIST = MASTER, STARTD, SCHEDD

ALLOW_WRITE = *.FOO.my.com, *.BAR.my.com

 

SGE_GAHP      = $(GLITE_LOCATION)/bin/batch_gahp

GLIDEIN_SITES = *.FOO.my.com

 

HOSTALLOW_WRITE = $(HOSTALLOW_WRITE), $(GLIDEIN_SITES)

 

USE_SHARED_PORT  = TRUE

SHARED_PORT_ARGS = -p 9614

DAEMON_LIST      = $(DAEMON_LIST), SHARED_PORT

 

START = TRUE

 

 

 

Best regards,

Lukas

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/