[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Flocking problem for 7.0.5->7.2.2 submission. New security issue?



Mark,

I'm guessing that you are not setting BIND_ALL_INTERFACES.

Starting in 7.1.1, BIND_ALL_INTERFACES is True by default. This means that setting NETWORK_INTERFACE without also setting BIND_ALL_INTERFACES=False just has the effect of controlling which interface Condor advertises, not which one it actually binds to (it binds to all of them and will therefore use whichever one the OS chooses in a particular case).

So I recommend setting BIND_ALL_INTERFACES=False and seeing if this addresses your problem.

--Dan

Mark Calleja wrote:
Hi All,

(Apologies if you receive multiple copies of this post. The camgrid-users mailing list appears to be blocking another of my email addresses.)

We currently run several pools (all linux) with v7.0.5 and are looking to upgrade piecemeal to v7.2.2. Encouraged by the entry in section 8.2 of the v7.2.2 manual, namely "We believe that Condor 7.2.x and 7.0.x are wire-compatible, and can be freely mixed between computers in a Condor pool.", we've been testing upgrading some machines. However, we're seeing jobs getting rejected when the schedd is running 7.0.5 and the startd is running 7.2.2. No other changes have been made, i.e. the configuration files have remained the same. Before I paste in the relevant parts of the log files, a bit of background: many of our machines have multiple IP addresses but Condor is forced to operate using a specific address, selected by the NETWORK_INTERFACE value in a machine's condor_config.local file. This address is always a "private" (RFC 1918) address in the range 172.24.xxx.xxx.

Here's an example. The submit host has IP address 172.24.252.25 only, whereas the execute has two addresses: 131.111.xxx.xxx (which should *not* be used by Condor) and 172.24.116.4 (which should). So, here's the SchedLog from the submit host for when both submit and execute host are running 7.0.5 (job completes correctly):

4/20 17:45:08 Using config source: /etc/condor/condor_config
4/20 17:45:08 Using local config sources:
4/20 17:45:08    /usr/local/condor/local/condor_config.local
4/20 17:45:08    /usr/local/condor/local/condor_config.flocking
4/20 17:45:08 DaemonCore: Command Socket at <172.24.252.25:13743 <http://172.24.252.25:13743>>
4/20 17:45:08 Initializing a VANILLA shadow for job 8.0
4/20 17:45:08 (8.0) (3799): Request to run on <172.24.116.4:9692 <http://172.24.116.4:9692>> was ACCEPTED
4/20 17:45:09 (8.0) (3799): ZKM: setting default map to (null)
4/20 17:45:09 (8.0) (3799): Job 8.0 terminated: exited with status 0
4/20 17:45:09 (8.0) (3799): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 100


Now the corresponding relevant snippet for when the execute host has been upgraded to 7.2.2 (job fails as file transfer does not take place):

4/18 06:19:52 Using config source: /etc/condor/condor_config
4/18 06:19:52 Using local config sources:
4/18 06:19:52    /usr/local/condor/local/condor_config.local
4/18 06:19:52    /usr/local/condor/local/condor_config.flocking
4/18 06:19:52 DaemonCore: Command Socket at <172.24.252.25:14228 <http://172.24.252.25:14228>>
4/18 06:19:52 Initializing a VANILLA shadow for job 6.0
4/18 06:19:52 (6.0) (3719): Request to run on <172.24.116.4:9668 <http://172.24.116.4:9668>> was ACCEPTED 4/18 06:19:52 (6.0) (3719): DaemonCore: PERMISSION DENIED to unknown user from host <131.111.xxx.xxx:9633> for command 61000 (FILETRANS_UPLOAD), access level WRITE 4/18 06:19:52 (6.0) (3719): ERROR "Error from starter on XXXX.escience.cam.ac.uk <http://XXXX.escience.cam.ac.uk>: Failed to transfer files" at line 649 in file pseudo_ops.C

It would appear that in 7.2.2 Condor's trying to make use of an interface on the execute host that's not the one nominated in NETWORK_INTERFACE (in this case it's the canonical, globally routeable address). Is there any reason why this has changed from 7.0.5? And is there any way of getting 7.2.2 to conform with the desired 7.0.5 behaviour?

Best regards,
Mark
------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/