[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job submission from a node outside a cluster instantiated on k8s



Hi Diego,

You can avoid having the schedd endpoint being on the public network *if*:

1.  You set the CCB_ADDRESS for the schedd but not the other daemons.  You don't have TCP_FORWARDING_HOST set.
2.  You set the PRIVATE_NETWORK_NAME configuration to the same value throughout the inside of the Kubernetes cluster.
3.  The client outside the cluster has a public IP address that can accept incoming network connections *or* it can authenticate with the local (to the client) shared port daemon to have the shared port effectively punch through the client host firewall.
4.  The client can authenticate with the CCB at the DAEMON level.

I think you were having issues with step (3) below.

In the end, it's a tall order -- seems like just opening a second public port for external access is easier.

Do note that I'd suggest avoiding a NodePort as that makes it hard to have several pods on the same cluster.  Is a LoadBalancer service type a possibility?

HTH,

Brian

> On Dec 1, 2020, at 12:17 PM, Diego Ciangottini <diego.ciangottini@xxxxxxxxxx> wrote:
> 
> Hi Brian,
> 
> thank you. Should I take your answer as: also the schedd *MUST* have a public endpoint? In this case, is it possible to change the schedd port (compliant with k8s node ports)?
> 
> In alternative I was thinking to access the schedd daemon running on priv net from outside using the ccb, is this a thing? This would be the ideal solution in this case I think.
> 
> Diego
> 
> Il 12/1/2020 1:52 PM, Bockelman, Brian ha scritto:
>> Hi Diego,
>> 
>> Have you tried setting TCP_FORWARDING_HOST to get the schedd to advertise an external address to the collector?
>> 
>> I suspect itâs advertising its internal address which, of course, is not a valid one coming from the outside.
>> 
>> Brian
>> 
>> Sent from my iPhone
>> 
>>> On Dec 1, 2020, at 5:42 AM, Diego Ciangottini <diego.ciangottini@xxxxxxxxxx> wrote:
>>> 
>>> ïAdding some more context and details on my investigation so far.
>>> 
>>> In this k8s manifest (*) you can find my latest try, and it basically does the following from the network point of view:
>>> 
>>> - CCB/Collector exposed to a nodeport on 30618, mapped to 30618 inside the container
>>> 
>>> - Schedd appears to the collector as a headless k8s service at schedd.condor.svc.cluster.local
>>>     - CCB address on schedd file, pointing to the public IP of collector (I tried also the private one, no difference in the outcome though)
>>> 
>>> With this configuration everything works perfectly as far as I am inside the cluster, but if I try from outside with this env (**) I get this error (***).
>>> 
>>> Can you help me in understanding if what I am trying makes any sense? Do you see any obvious reason for this not to work? Any feedback at this point is very appreciated.
>>> 
>>> P.S. if I remove the CCB_ADDRESS from the condor configuration of the schedd I get this instead (****), don't know if it helps.
>>> 
>>> Thanks,
>>> Diego
>>> 
>>> (*)
>>> 
>>> https://gist.github.com/dciangot/171ef8981ba554fed4ca8db97b4ddbf7
>>> 
>>> (**)
>>> 
>>> export _condor_AUTH_SSL_CLIENT_CAFILE=/ca.crt
>>> export _condor_SEC_DEFAULT_AUTHENTICATION_METHODS=SCITOKENS
>>> export _condor_SCITOKENS_FILE=/tmp/token
>>> export _condor_COLLECTOR_HOST=90.147.174.149.xip.io:30618
>>> export _condot_TOOL_DEBUG=D_FULLDEBUG,D_SECURITY
>>> 
>>> (***)
>>> 
>>> condor_q -address `condor_status -schedd -af ScheddIpAddr` -debug
>>> 12/01/20 11:23:22 ZKM: In unwrap.
>>> 12/01/20 11:23:22 SharedPortEndpoint: failed to find MyAddress in ad from /var/lock/condor/shared_port_ad.
>>> 12/01/20 11:23:22 CCBClient: Failed to get remote address for shared port endpoint for reversed connection from schedd at <10.244.1.20:9618>.
>>> 12/01/20 11:23:22 Failed to reverse connect to schedd at <10.244.1.20:9618> via CCB.
>>> 
>>> -- Failed to fetch ads from: <10.244.1.20:9618?CCBID=10.244.2.21:30618%3faddrs%3d10.244.2.21-30618%26alias%3d90.147.174.149.xip.io%26noUDP%26sock%3dcollector#2&PrivNet=schedd.condor.svc.cluster.local&addrs=10.244.1.20-9618&alias=schedd.condor.svc.cluster.local&noUDP&sock=schedd_21_fb94> : schedd.condor.svc.cluster.local
>>> CEDAR:6001:Failed to connect to <10.244.1.20:9618?CCBID=10.244.2.21:30618%3faddrs%3d10.244.2.21-30618%26alias%3d90.147.174.149.xip.io%26noUDP%26sock%3dcollector#2&PrivNet=schedd.condor.svc.cluster.local&addrs=10.244.1.20-9618&alias=schedd.condor.svc.cluster.local&noUDP&sock=schedd_21_fb94>
>>> 
>>> (****)
>>> 
>>> condor_q -address `condor_status -schedd -af ScheddIpAddr` -debug
>>> 12/01/20 11:10:06 ZKM: In unwrap.
>>> 12/01/20 11:10:26 attempt to connect to <10.244.1.17:9618> failed: timed out after 20 seconds.
>>> 
>>> -- Failed to fetch ads from: <10.244.1.17:9618?addrs=10.244.1.17-9618&alias=schedd.condor.svc.cluster.local&noUDP&sock=schedd_21_fb94> : schedd.condor.svc.cluster.local
>>> CEDAR:6001:Failed to connect to <10.244.1.17:9618?addrs=10.244.1.17-9618&alias=schedd.condor.svc.cluster.local&noUDP&sock=schedd_21_fb94>
>>> 
>>> 
>>> Il 12/1/2020 12:52 AM, Diego Ciangottini ha scritto:
>>>> Hi again,
>>>> 
>>>> partially related to the activity of the previous email, I'm trying to update our cluster setup on k8s and I was wondering if it was possible to optimize what we are currently using.
>>>> 
>>>> In particular, we are keeping the schedd and collector pod on host network accessible from outside in order to allow submssion from nodes outside the cluster. This comes at the cost of losing a lot of flexibility in the deployment of course.
>>>> 
>>>> So, is there any way to expose only the collector port as a service and making also the schedd running on private network only leveraging CCB or other solutions? Any suggestion/previous experience?
>>>> 
>>>> Thanks,
>>>> Diego
>>>> 
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>> 
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/