[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Remote cluster test failed when using condor_remote_cluster command



If we modify ~/.ssh/config, ssh client on linux can specify the port directly, which is used on the `condor_remote_cluster -a`. However, condor_submit may not use ssh client to dispatch jobs. So modification on ~/.ssh/config does not affect `condor_remote_cluster -t`. Is there anyway to specify the port for `condor_submit`? It may be helpful.

Yifei Li



 
------------------ Original ------------------
Date:  Tue, Aug 22, 2023 09:39 PM
To:  "htcondor-users"<htcondor-users@xxxxxxxxxxx>;
Subject:  Re: [HTCondor-users]Remote cluster test failed when using condor_remote_cluster command
 
Thanks for your reply! However this command is not useful. I am reading the source code of condor_remote_cluster. If you have any other suggestion about how to specify the port. Please let me know the news.

(base) liyifei@ubuntu:~$ condor_remote_cluster -a cse12232396@xxxxxxxxxxxx:10022 slurm

Enter the password to copy the ssh keys to cse12232396@xxxxxxxxxxxx:10022:

ssh: Could not resolve hostname 172.18.34.19:10022: Name or service not known


Yifei Li
 
 
------------------ Original ------------------
Date:  Tue, Aug 22, 2023 09:22 PM
To:  "htcondor-users"<htcondor-users@xxxxxxxxxxx>;
Cc:  "Jaime Frey"<jfrey@xxxxxxxxxxx>;
Subject:  Re: [HTCondor-users] Remote cluster test failed when using condor_remote_cluster command
 
I donât know why the alternate port number in ~/.ssh/config would work with --add but not --test. You can include the alternate port number in the hostname, like so:

condor_remote_cluster -a cse12232396@xxxxxxxxxxxx:12345 slurm
condor_remote_cluster -a cse12232396@xxxxxxxxxxxx:12345

grid_resource = batch slurm cse12232396@xxxxxxxxxxxx:12345

 - Jaime

> On Aug 22, 2023, at 6:50 AM, Yifei Li <12232396@xxxxxxxxxxxxxxxxxxx> wrote:
>
> Thank you so much!!
> I have installed remote cluster successfully. However our remote cluster's ssh port is not 22. How can i set the ssh port for remote cluster?  I have added the remote ssh info into ~/.ssh/config, which means i can use ssh command without specify port. However it does not work when using condor_remote_cluster -t(it works for condor_remote_cluster -add). Here is the log showing failed test.
>
> (base) liyifei@ubuntu:~$ condor_remote_cluster -t cse12232396@xxxxxxxxxxxx
> Testing ssh to cse12232396@xxxxxxxxxxxxxxxxxxxxx!
> Testing remote submission...Passed!
> Submission and log files for this job are in /home/liyifei/bosco-test/boscotest.aYtWz
> Waiting for jobmanager to accept job...Passed
> Checking for submission to remote slurm cluster (could take ~30 seconds)...Failed
> Showing last 5 lines of logs:
> 08/22/23 19:39:27 [1938276] Error starting 172.18.34.19 GAHP: Agent pid 1938292\nssh: connect to host 172.18.34.19 port 22: Connection refused\nAgent pid 1938292 killed\n
> 08/22/23 19:39:27 [1938276] resource cse12232396@xxxxxxxxxxxx is now down
> 08/22/23 19:39:27 [1938276] (6.0) doEvaluateState called: gmState GM_INIT, remoteState 0
> 08/22/23 19:39:27 [1938276] Gahp Server (pid=1938286) exited with status 255 unexpectedly
> 08/22/23 19:39:31 [1938276] (6.0) doEvaluateState called: gmState GM_CLEAR_REQUEST, remoteState 0
>
> Yifei Li
>
>
>
>
>    ------------------ Original ------------------
> From:  "Tim Theisen"<tim@xxxxxxxxxxx>;
> Date:  Tue, Aug 22, 2023 07:26 PM
> To:  "htcondor-users"<htcondor-users@xxxxxxxxxxx>; "Yifei Li"<12232396@xxxxxxxxxxxxxxxxxxx>; Subject:  Re: [HTCondor-users] Remote cluster test failed when using condor_remote_cluster command
>  When I checked this morning, the file server is back online.
> ...Tim
> On 8/21/23 22:19, Tim Theisen via HTCondor-users wrote:
>> I have confirmed that file server is currently not available. I will report back when it is operational.
>> ...Tim
>> On 8/21/23 20:45, Yifei Li wrote:
>>> Thanks for your reply!
>>> I am trying to use condor_remote_cluster under a regular account. But it seems that there is network error during downloading installation file. Is the file server shutdown? I downloaded it successfully several days ago. Could you check it for me? Thank you!
>>>
>>> ***Log***
>>> liyifei@ubuntu:~$ condor_remote_cluster --add cse12232396@1**** slurm
>>> Enter the password to copy the ssh keys to cse12232396@xxxxxxxxxxxx:
>>> Downloading release build for cse12232396@****..............................................................................................................................curl: (28) Failed to connect to research.cs.wisc.edu port 443: Connection timed out
>>> Failure
>>> Failed to download release build.
>>> Unable to download and prepare files for remote installation.
>>> Download URL: https://research.cs.wisc.edu/htcondor/tarball/10.x/10.7.0/release/condor-10.7.0-x86_64_AlmaLinux8-stripped.tar.gz
>>> Aborting installation to cse12232396@***.
>>>
>>> Yifei Li
>>>
>>>
>>>
>>>
>>>       ------------------ Original ------------------
>>> From:  "Jaime Frey via HTCondor-users"<htcondor-users@xxxxxxxxxxx>;
>>> Date:  Tue, Aug 22, 2023 05:24 AM
>>> To:  "htcondor-users"<htcondor-users@xxxxxxxxxxx>;
>>> Cc:  "Jaime Frey"<jfrey@xxxxxxxxxxx>;
>>> Subject:  Re: [HTCondor-users] Remote cluster test failed when using condor_remote_cluster command
>>>   The condor_remote_cluster command has to be run under the regular user account under which you will be submitting your workflow jobs. You donât run it as the root user.
>>>
>>> You can use condor_remote_cluster to access two different clusters simultaneously for your workflows. One thing to keep in mind is that each submit file must name the cluster that that job should be run on, like so:
>>>
>>> grid_resoruce = batch slurm cluster1.foo.edu
>>>
>>> If youâre using DAGMan, you can use the VARS command to set the cluster to use for a whole set of nodes in the DAG.
>>>
>>>  - Jaime
>>>
>>>> On Aug 19, 2023, at 2:22 AM, æéé <12232396@xxxxxxxxxxxxxxxxxxx> wrote:
>>>>
>>>> Dear HTCondor development Team,
>>>>     I can access two campus clusters, which one is LSF based, the other is Slurm based. Since i am not a administrator of these cluster and i still want to use them to execute one workflow simultaneously, I think i can use condor_remote_cluster to achieve my goal. First question: Can I utilize the two cluster by HTCondor to execute a workflow simultaneously?
>>>>     Until now, I have done some effort to achieve my goal. I installed HTCondor(MiniCondor) on my PC workstation in the same local area network of campus clusters. I tried to use condor_remote_cluster command to add the LSF cluster and Slurm cluster. I added them successfully and they are shown in the remote cluster list. However, when I try to test using "condor_remote_cluster -t" command. The task can't be dispatched to the remote cluster. There will be an idle task in the condor_q.
>>>> Could you provide some suggestions to help me set up my environment? Is it possible for me to achieve my goals without root access of cluster? Looking forward to your reply.
>>>>
>>>> ****Log from my PC workstation****
>>>> root@ubuntu:~/bosco-test/boscotest.p3SGb# condor_remote_cluster -t cse-liyf@xxxxxxxxxxxx
>>>> Testing ssh to cse-liyf@xxxxxxxxxxxxxxxxxxxxx!
>>>> Testing remote submission...Passed!
>>>> Submission and log files for this job are in /root/bosco-test/boscotest.2DBlK
>>>> Waiting for jobmanager to accept job...Passed
>>>> Checking for submission to remote lsf cluster (could take ~30 seconds)...grep: /root/bosco-test/boscotest.2DBlK/logfile: No such file or directory
>>>> grep: /root/bosco-test/boscotest.2DBlK/logfile: No such file or directory
>>>> grep: /root/bosco-test/boscotest.2DBlK/logfile: No such file or directory
>>>> grep: /root/bosco-test/boscotest.2DBlK/logfile: No such file or directory
>>>> grep: /root/bosco-test/boscotest.2DBlK/logfile: No such file or directory
>>>> Then failed.
>>>>
>>>>
>>>> Yifei Li
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>   _______________________________________________
>>>> HTCondor-users mailing list
>>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>>> subject: Unsubscribe
>>>> You can also unsubscribe by visiting
>>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>>
>>>> The archives can be found at:
>>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>>
>>>
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>> --
>> Tim Theisen (he, him, his)
>> Release Manager
>> HTCondor & Open Science Grid
>> Center for High Throughput Computing
>> Department of Computer Sciences
>> University of Wisconsin - Madison
>> 4261 Computer Sciences and Statistics
>> 1210 W Dayton St
>> Madison, WI 53706-1685
>> +1 608 265 5736
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> --
> Tim Theisen (he, him, his)
> Release Manager
> HTCondor & Open Science Grid
> Center for High Throughput Computing
> Department of Computer Sciences
> University of Wisconsin - Madison
> 4261 Computer Sciences and Statistics
> 1210 W Dayton St
> Madison, WI 53706-1685
> +1 608 265 5736
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/