[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Multiple Collectors + CCB & Shared Port



Hi,

If you /want/ to have multiple ports open, you need to turn off
shared port in the sub-collectors.  The standard way to do this is to
add _CONDOR_USE_SHARED_PORT=FALSE to the environment line you already
have defined.

A big thank you. Your advice works great and everything is running smooth.

Though I've still not 100% understood, which values can be defined via <Subsystem>.Value / <LocalName>.Value and which values have to be set in <System>_ENVIRONMENT, I know now to pay (even more) attention to it.

I assume setting COLLECTOR02.USE_SHARED_PORT = FALSE never tells the Shared_Port_Daemon to NOT service this collector (since the configuration value is only visible to COLLECTOR02), while Condor internally handles everything when setting
COLLECTOR02_ENVIRONMENT = "_CONDOR_USE_SHARED_PORT=FALSE"?

If you /can't/ have multiple ports open, then using -sock in the
_ARGS configuration should set the shared port socket name for those
collectors.  If you're just using the sub-collectors for CCB, you should
be able to set an execute node CCB_ADDRESS to a string like
1.2.3.4:9618?sock=collector2 without any further worries.  (This may
have to be '<1.2.3.4:9618?sock=collector2>' -- with the <brackets> -- or
could be different, depending on other details of your networking
configuration; [...]  If you want the sub-collectors to help
deal with expensive security sessions, you'll need to set the execute
node's COLLECTOR_HOST instead, which may cause further problems (IIRC,
the default security configuration uses COLLECTOR_HOST, but strings of
the form above don't work in security).

As I've just learned from another mail on the mailing-list, I can go to non-blocking (password) authentication by updating to 8.5.6+; in that case it's probably enough to use a single collector with multiple CCBs.
I'll try that together with your advice, once I've upgraded.

Thanks :)

COLLECTOR.NumProcesses = 4

I don't see this in the manual, and as far as I know is unnecessary
for what you're doing. [...]

This was an internal knob for me to scale inversely with the number of running Collector processes, but it was a rather stupid idea I since got rid of. My problem when queries timed out was always due to the CCB and maybe authentication overloading the collector, which is solved now.

COLLECTOR_HOST                  = $(CONDOR_HOST):20000

This is a little weird, because it will make the collector at port
20000 the 'main' collector, and everything will query the collector
behind the shared port by default, which won't know everything.

That was a remnant from my trial and error which sneaked in when copy pasting. It should have "just" been $(CONDOR_HOST).

Thanks & best regards
Frank