[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job submission from a node outside a cluster instantiated on k8s



Hi Diego,

Have you tried setting TCP_FORWARDING_HOST to get the schedd to advertise an external address to the collector?

I suspect itâs advertising its internal address which, of course, is not a valid one coming from the outside.

Brian

Sent from my iPhone

> On Dec 1, 2020, at 5:42 AM, Diego Ciangottini <diego.ciangottini@xxxxxxxxxx> wrote:
> 
> ïAdding some more context and details on my investigation so far.
> 
> In this k8s manifest (*) you can find my latest try, and it basically does the following from the network point of view:
> 
> - CCB/Collector exposed to a nodeport on 30618, mapped to 30618 inside the container
> 
> - Schedd appears to the collector as a headless k8s service at schedd.condor.svc.cluster.local
>     - CCB address on schedd file, pointing to the public IP of collector (I tried also the private one, no difference in the outcome though)
> 
> With this configuration everything works perfectly as far as I am inside the cluster, but if I try from outside with this env (**) I get this error (***).
> 
> Can you help me in understanding if what I am trying makes any sense? Do you see any obvious reason for this not to work? Any feedback at this point is very appreciated.
> 
> P.S. if I remove the CCB_ADDRESS from the condor configuration of the schedd I get this instead (****), don't know if it helps.
> 
> Thanks,
> Diego
> 
> (*)
> 
> https://gist.github.com/dciangot/171ef8981ba554fed4ca8db97b4ddbf7
> 
> (**)
> 
> export _condor_AUTH_SSL_CLIENT_CAFILE=/ca.crt
> export _condor_SEC_DEFAULT_AUTHENTICATION_METHODS=SCITOKENS
> export _condor_SCITOKENS_FILE=/tmp/token
> export _condor_COLLECTOR_HOST=90.147.174.149.xip.io:30618
> export _condot_TOOL_DEBUG=D_FULLDEBUG,D_SECURITY
> 
> (***)
> 
> condor_q -address `condor_status -schedd -af ScheddIpAddr` -debug
> 12/01/20 11:23:22 ZKM: In unwrap.
> 12/01/20 11:23:22 SharedPortEndpoint: failed to find MyAddress in ad from /var/lock/condor/shared_port_ad.
> 12/01/20 11:23:22 CCBClient: Failed to get remote address for shared port endpoint for reversed connection from schedd at <10.244.1.20:9618>.
> 12/01/20 11:23:22 Failed to reverse connect to schedd at <10.244.1.20:9618> via CCB.
> 
> -- Failed to fetch ads from: <10.244.1.20:9618?CCBID=10.244.2.21:30618%3faddrs%3d10.244.2.21-30618%26alias%3d90.147.174.149.xip.io%26noUDP%26sock%3dcollector#2&PrivNet=schedd.condor.svc.cluster.local&addrs=10.244.1.20-9618&alias=schedd.condor.svc.cluster.local&noUDP&sock=schedd_21_fb94> : schedd.condor.svc.cluster.local
> CEDAR:6001:Failed to connect to <10.244.1.20:9618?CCBID=10.244.2.21:30618%3faddrs%3d10.244.2.21-30618%26alias%3d90.147.174.149.xip.io%26noUDP%26sock%3dcollector#2&PrivNet=schedd.condor.svc.cluster.local&addrs=10.244.1.20-9618&alias=schedd.condor.svc.cluster.local&noUDP&sock=schedd_21_fb94>
> 
> (****)
> 
> condor_q -address `condor_status -schedd -af ScheddIpAddr` -debug
> 12/01/20 11:10:06 ZKM: In unwrap.
> 12/01/20 11:10:26 attempt to connect to <10.244.1.17:9618> failed: timed out after 20 seconds.
> 
> -- Failed to fetch ads from: <10.244.1.17:9618?addrs=10.244.1.17-9618&alias=schedd.condor.svc.cluster.local&noUDP&sock=schedd_21_fb94> : schedd.condor.svc.cluster.local
> CEDAR:6001:Failed to connect to <10.244.1.17:9618?addrs=10.244.1.17-9618&alias=schedd.condor.svc.cluster.local&noUDP&sock=schedd_21_fb94>
> 
> 
> Il 12/1/2020 12:52 AM, Diego Ciangottini ha scritto:
>> Hi again,
>> 
>> partially related to the activity of the previous email, I'm trying to update our cluster setup on k8s and I was wondering if it was possible to optimize what we are currently using.
>> 
>> In particular, we are keeping the schedd and collector pod on host network accessible from outside in order to allow submssion from nodes outside the cluster. This comes at the cost of losing a lot of flexibility in the deployment of course.
>> 
>> So, is there any way to expose only the collector port as a service and making also the schedd running on private network only leveraging CCB or other solutions? Any suggestion/previous experience?
>> 
>> Thanks,
>> Diego
>> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/