[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Remote cluster test failed when using condor_remote_cluster command



When I checked this morning, the file server is back online.

...Tim

On 8/21/23 22:19, Tim Theisen via HTCondor-users wrote:

I have confirmed that file server is currently not available. I will report back when it is operational.

...Tim

On 8/21/23 20:45, Yifei Li wrote:
Thanks for your reply!
I am trying to use condor_remote_cluster under a regular account. But it seems that there is network error during downloading installation file. Is the file server shutdown? I downloaded it successfully several days ago. Could you check it for me? Thank you!

***Log***

liyifei@ubuntu:~$ condor_remote_cluster --add cse12232396@1**** slurm

Enter the password to copy the ssh keys to cse12232396@xxxxxxxxxxxx:

Downloading release build for cse12232396@****..............................................................................................................................curl: (28) Failed to connect to research.cs.wisc.edu port 443: Connection timed out

Failure

Failed to download release build.

Unable to download and prepare files for remote installation.

Download URL: https://research.cs.wisc.edu/htcondor/tarball/10.x/10.7.0/release/condor-10.7.0-x86_64_AlmaLinux8-stripped.tar.gz

Aborting installation to cse12232396@***.


Yifei Li




 
 
 
------------------ Original ------------------
Date:  Tue, Aug 22, 2023 05:24 AM
To:  "htcondor-users"<htcondor-users@xxxxxxxxxxx>;
Cc:  "Jaime Frey"<jfrey@xxxxxxxxxxx>;
Subject:  Re: [HTCondor-users] Remote cluster test failed when using condor_remote_cluster command
 
The condor_remote_cluster command has to be run under the regular user account under which you will be submitting your workflow jobs. You donât run it as the root user.

You can use condor_remote_cluster to access two different clusters simultaneously for your workflows. One thing to keep in mind is that each submit file must name the cluster that that job should be run on, like so:

grid_resoruce = batch slurm cluster1.foo.edu

If youâre using DAGMan, you can use the VARS command to set the cluster to use for a whole set of nodes in the DAG.

 - Jaime

On Aug 19, 2023, at 2:22 AM, æéé <12232396@xxxxxxxxxxxxxxxxxxx> wrote:

Dear HTCondor development Team,
    I can access two campus clusters, which one is LSF based, the other is Slurm based. Since i am not a administrator of these cluster and i still want to use them to execute one workflow simultaneously, I think i can use condor_remote_cluster to achieve my goal. First question: Can I utilize the two cluster by HTCondor to execute a workflow simultaneously?
    Until now, I have done some effort to achieve my goal. I installed HTCondor(MiniCondor) on my PC workstation in the same local area network of campus clusters. I tried to use condor_remote_cluster command to add the LSF cluster and Slurm cluster. I added them successfully and they are shown in the remote cluster list. However, when I try to test using "condor_remote_cluster -t" command. The task can't be dispatched to the remote cluster. There will be an idle task in the condor_q.
Could you provide some suggestions to help me set up my environment? Is it possible for me to achieve my goals without root access of cluster? Looking forward to your reply.

****Log from my PC workstation****
Testing remote submission...Passed!
Submission and log files for this job are in /root/bosco-test/boscotest.2DBlK
Waiting for jobmanager to accept job...Passed
Checking for submission to remote lsf cluster (could take ~30 seconds)...grep: /root/bosco-test/boscotest.2DBlK/logfile: No such file or directory
grep: /root/bosco-test/boscotest.2DBlK/logfile: No such file or directory
grep: /root/bosco-test/boscotest.2DBlK/logfile: No such file or directory
grep: /root/bosco-test/boscotest.2DBlK/logfile: No such file or directory
grep: /root/bosco-test/boscotest.2DBlK/logfile: No such file or directory
Then failed.


Yifei Li









 
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
-- 
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
-- 
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736