[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Remote cluster test failed when using condor_remote_cluster command



Hi Yifei Li,

can you actually reach the slurm node from your submit machine?

AFAIS the IP is within the private network 172.16.0.0/12 block. Might be that you are trying to reach the SLURM node on its private interface from outside?

Cheers,
  Thomas

On 22/08/2023 15.36, Yifei Li wrote:
Thanks for your reply! However this command is not useful. I am reading the source code of condor_remote_cluster. If you have any other suggestion about how to specify the port. Please let me know the news.

(base) *liyifei@ubuntu*:*~*$ condor_remote_cluster -a cse12232396@xxxxxxxxxxxx:10022 slurm

Enter the password to copy the ssh keys to cse12232396@xxxxxxxxxxxx:10022:

ssh: Could not resolve hostname 172.18.34.19:10022: Name or service not known


Yifei Li
------------------ÂOriginalÂ------------------
*From: *Â"JaimeÂFreyÂviaÂHTCondor-users"<htcondor-users@xxxxxxxxxxx>;
*Date: *ÂTue, Aug 22, 2023 09:22 PM
*To: *Â"htcondor-users"<htcondor-users@xxxxxxxxxxx>;
*Cc: *Â"Jaime Frey"<jfrey@xxxxxxxxxxx>;
*Subject: *ÂRe: [HTCondor-users] Remote cluster test failed when using condor_remote_cluster command
IÂdonâtÂknowÂwhyÂtheÂalternateÂportÂnumberÂinÂ~/.ssh/configÂwouldÂworkÂwithÂ--addÂbutÂnotÂ--test.ÂYouÂcanÂincludeÂtheÂalternateÂportÂnumberÂinÂtheÂhostname,ÂlikeÂso:

condor_remote_clusterÂ-aÂcse12232396@xxxxxxxxxxxx:12345Âslurm
condor_remote_clusterÂ-aÂcse12232396@xxxxxxxxxxxx:12345

grid_resourceÂ=ÂbatchÂslurmÂcse12232396@xxxxxxxxxxxx:12345

 Â-ÂJaime

 >ÂOnÂAugÂ22,Â2023,ÂatÂ6:50ÂAM,ÂYifeiÂLiÂ<12232396@xxxxxxxxxxxxxxxxxxx>Âwrote:
 >
 >ÂThankÂyouÂsoÂmuch!!
 >ÂIÂhaveÂinstalledÂremoteÂclusterÂsuccessfully.ÂHoweverÂourÂremoteÂcluster'sÂsshÂportÂisÂnotÂ22.ÂHowÂcanÂiÂsetÂtheÂsshÂportÂforÂremoteÂcluster?ÂÂIÂhaveÂaddedÂtheÂremoteÂsshÂinfoÂintoÂ~/.ssh/config,ÂwhichÂmeansÂiÂcanÂuseÂsshÂcommandÂwithoutÂspecifyÂport.ÂHoweverÂitÂdoesÂnotÂworkÂwhenÂusingÂcondor_remote_clusterÂ-t(itÂworksÂforÂcondor_remote_clusterÂ-add).ÂHereÂisÂtheÂlogÂshowingÂfailedÂtest.
 >
 >Â(base)Âliyifei@ubuntu:~$Âcondor_remote_clusterÂ-tÂcse12232396@xxxxxxxxxxxx
 >ÂTestingÂsshÂtoÂcse12232396@xxxxxxxxxxxxxxxxxxxxx!
 >ÂTestingÂremoteÂsubmission...Passed!
 >ÂSubmissionÂandÂlogÂfilesÂforÂthisÂjobÂareÂinÂ/home/liyifei/bosco-test/boscotest.aYtWz
 >ÂWaitingÂforÂjobmanagerÂtoÂacceptÂjob...Passed
 >ÂCheckingÂforÂsubmissionÂtoÂremoteÂslurmÂclusterÂ(couldÂtakeÂ~30Âseconds)...Failed
 >ÂShowingÂlastÂ5ÂlinesÂofÂlogs:
 >Â08/22/23Â19:39:27Â[1938276]ÂErrorÂstartingÂ172.18.34.19ÂGAHP:ÂAgentÂpidÂ1938292\nssh:ÂconnectÂtoÂhostÂ172.18.34.19ÂportÂ22:ÂConnectionÂrefused\nAgentÂpidÂ1938292Âkilled\n
 >Â08/22/23Â19:39:27Â[1938276]ÂresourceÂcse12232396@xxxxxxxxxxxxÂisÂnowÂdown
 >Â08/22/23Â19:39:27Â[1938276]Â(6.0)ÂdoEvaluateStateÂcalled:ÂgmStateÂGM_INIT,ÂremoteStateÂ0
 >Â08/22/23Â19:39:27Â[1938276]ÂGahpÂServerÂ(pid=1938286)ÂexitedÂwithÂstatusÂ255Âunexpectedly
 >Â08/22/23Â19:39:31Â[1938276]Â(6.0)ÂdoEvaluateStateÂcalled:ÂgmStateÂGM_CLEAR_REQUEST,ÂremoteStateÂ0
 >
 >ÂYifeiÂLi
 >
 >
 >
 >
 >ÂÂÂÂ------------------ÂOriginalÂ------------------
 >ÂFrom:ÂÂ"TimÂTheisen"<tim@xxxxxxxxxxx>;
 >ÂDate:ÂÂTue,ÂAugÂ22,Â2023Â07:26ÂPM
 >ÂTo:ÂÂ"htcondor-users"<htcondor-users@xxxxxxxxxxx>;Â"YifeiÂLi"<12232396@xxxxxxxxxxxxxxxxxxx>;ÂSubject:ÂÂRe:Â[HTCondor-users]ÂRemoteÂclusterÂtestÂfailedÂwhenÂusingÂcondor_remote_clusterÂcommand
 >ÂÂWhenÂIÂcheckedÂthisÂmorning,ÂtheÂfileÂserverÂisÂbackÂonline.
 >Â...Tim
 >ÂOnÂ8/21/23Â22:19,ÂTimÂTheisenÂviaÂHTCondor-usersÂwrote:
 >>ÂIÂhaveÂconfirmedÂthatÂfileÂserverÂisÂcurrentlyÂnotÂavailable.ÂIÂwillÂreportÂbackÂwhenÂitÂisÂoperational.
 >>Â...Tim
 >>ÂOnÂ8/21/23Â20:45,ÂYifeiÂLiÂwrote:
 >>>ÂThanksÂforÂyourÂreply!
 >>>ÂIÂamÂtryingÂtoÂuseÂcondor_remote_clusterÂunderÂaÂregularÂaccount.ÂButÂitÂseemsÂthatÂthereÂisÂnetworkÂerrorÂduringÂdownloadingÂinstallationÂfile.ÂIsÂtheÂfileÂserverÂshutdown?ÂIÂdownloadedÂitÂsuccessfullyÂseveralÂdaysÂago.ÂCouldÂyouÂcheckÂitÂforÂme?ÂThankÂyou!
 >>>
 >>>Â***Log***
 >>>Âliyifei@ubuntu:~$Âcondor_remote_clusterÂ--addÂcse12232396@1****Âslurm
 >>>ÂEnterÂtheÂpasswordÂtoÂcopyÂtheÂsshÂkeysÂtoÂcse12232396@xxxxxxxxxxxx:
 >>>ÂDownloadingÂreleaseÂbuildÂforÂcse12232396@****..............................................................................................................................curl:Â(28)ÂFailedÂtoÂconnectÂtoÂresearch.cs.wisc.eduÂportÂ443:ÂConnectionÂtimedÂout
 >>>ÂFailure
 >>>ÂFailedÂtoÂdownloadÂreleaseÂbuild.
 >>>ÂUnableÂtoÂdownloadÂandÂprepareÂfilesÂforÂremoteÂinstallation.
 >>>ÂDownloadÂURL:Âhttps://research.cs.wisc.edu/htcondor/tarball/10.x/10.7.0/release/condor-10.7.0-x86_64_AlmaLinux8-stripped.tar.gz
 >>>ÂAbortingÂinstallationÂtoÂcse12232396@***.
 >>>
 >>>ÂYifeiÂLi
 >>>
 >>>
 >>>
 >>>
 >>>ÂÂÂÂÂÂÂ------------------ÂOriginalÂ------------------
 >>>ÂFrom:ÂÂ"JaimeÂFreyÂviaÂHTCondor-users"<htcondor-users@xxxxxxxxxxx>;
 >>>ÂDate:ÂÂTue,ÂAugÂ22,Â2023Â05:24ÂAM
 >>>ÂTo:ÂÂ"htcondor-users"<htcondor-users@xxxxxxxxxxx>;
 >>>ÂCc:ÂÂ"JaimeÂFrey"<jfrey@xxxxxxxxxxx>;
 >>>ÂSubject:ÂÂRe:Â[HTCondor-users]ÂRemoteÂclusterÂtestÂfailedÂwhenÂusingÂcondor_remote_clusterÂcommand
 >>>ÂÂÂTheÂcondor_remote_clusterÂcommandÂhasÂtoÂbeÂrunÂunderÂtheÂregularÂuserÂaccountÂunderÂwhichÂyouÂwillÂbeÂsubmittingÂyourÂworkflowÂjobs.ÂYouÂdonâtÂrunÂitÂasÂtheÂrootÂuser.
 >>>
 >>>ÂYouÂcanÂuseÂcondor_remote_clusterÂtoÂaccessÂtwoÂdifferentÂclustersÂsimultaneouslyÂforÂyourÂworkflows.ÂOneÂthingÂtoÂkeepÂinÂmindÂisÂthatÂeachÂsubmitÂfileÂmustÂnameÂtheÂclusterÂthatÂthatÂjobÂshouldÂbeÂrunÂon,ÂlikeÂso:
 >>>
 >>>Âgrid_resoruceÂ=ÂbatchÂslurmÂcluster1.foo.edu
 >>>
 >>>ÂIfÂyouâreÂusingÂDAGMan,ÂyouÂcanÂuseÂtheÂVARSÂcommandÂtoÂsetÂtheÂclusterÂtoÂuseÂforÂaÂwholeÂsetÂofÂnodesÂinÂtheÂDAG.
 >>>
 >>>ÂÂ-ÂJaime
 >>>
>>>>ÂOnÂAugÂ19,Â2023,ÂatÂ2:22ÂAM,Âæé éÂ<12232396@xxxxxxxxxxxxxxxxxxx>Âwrote:
 >>>>
 >>>>ÂDearÂHTCondorÂdevelopmentÂTeam,
 >>>>ÂÂÂÂÂIÂcanÂaccessÂtwoÂcampusÂclusters,ÂwhichÂoneÂisÂLSFÂbased,ÂtheÂotherÂisÂSlurmÂbased.ÂSinceÂiÂamÂnotÂaÂadministratorÂofÂtheseÂclusterÂandÂiÂstillÂwantÂtoÂuseÂthemÂtoÂexecuteÂoneÂworkflowÂsimultaneously,ÂIÂthinkÂiÂcanÂuseÂcondor_remote_clusterÂtoÂachieveÂmyÂgoal.ÂFirstÂquestion:ÂCanÂIÂutilizeÂtheÂtwoÂclusterÂbyÂHTCondorÂtoÂexecuteÂaÂworkflowÂsimultaneously?
 >>>>ÂÂÂÂÂUntilÂnow,ÂIÂhaveÂdoneÂsomeÂeffortÂtoÂachieveÂmyÂgoal.ÂIÂinstalledÂHTCondor(MiniCondor)ÂonÂmyÂPCÂworkstationÂinÂtheÂsameÂlocalÂareaÂnetworkÂofÂcampusÂclusters.ÂIÂtriedÂtoÂuseÂcondor_remote_clusterÂcommandÂtoÂaddÂtheÂLSFÂclusterÂandÂSlurmÂcluster.ÂIÂaddedÂthemÂsuccessfullyÂandÂtheyÂareÂshownÂinÂtheÂremoteÂclusterÂlist.ÂHowever,ÂwhenÂIÂtryÂtoÂtestÂusingÂ"condor_remote_clusterÂ-t"Âcommand.ÂTheÂtaskÂcan'tÂbeÂdispatchedÂtoÂtheÂremoteÂcluster.ÂThereÂwillÂbeÂanÂidleÂtaskÂinÂtheÂcondor_q.
 >>>>ÂCouldÂyouÂprovideÂsomeÂsuggestionsÂtoÂhelpÂmeÂsetÂupÂmyÂenvironment?ÂIsÂitÂpossibleÂforÂmeÂtoÂachieveÂmyÂgoalsÂwithoutÂrootÂaccessÂofÂcluster?ÂLookingÂforwardÂtoÂyourÂreply.
 >>>>
 >>>>Â****LogÂfromÂmyÂPCÂworkstation****
 >>>>Âroot@ubuntu:~/bosco-test/boscotest.p3SGb#Âcondor_remote_clusterÂ-tÂcse-liyf@xxxxxxxxxxxx
 >>>>ÂTestingÂsshÂtoÂcse-liyf@xxxxxxxxxxxxxxxxxxxxx!
 >>>>ÂTestingÂremoteÂsubmission...Passed!
 >>>>ÂSubmissionÂandÂlogÂfilesÂforÂthisÂjobÂareÂinÂ/root/bosco-test/boscotest.2DBlK
 >>>>ÂWaitingÂforÂjobmanagerÂtoÂacceptÂjob...Passed
 >>>>ÂCheckingÂforÂsubmissionÂtoÂremoteÂlsfÂclusterÂ(couldÂtakeÂ~30Âseconds)...grep:Â/root/bosco-test/boscotest.2DBlK/logfile:ÂNoÂsuchÂfileÂorÂdirectory
 >>>>Âgrep:Â/root/bosco-test/boscotest.2DBlK/logfile:ÂNoÂsuchÂfileÂorÂdirectory
 >>>>Âgrep:Â/root/bosco-test/boscotest.2DBlK/logfile:ÂNoÂsuchÂfileÂorÂdirectory
 >>>>Âgrep:Â/root/bosco-test/boscotest.2DBlK/logfile:ÂNoÂsuchÂfileÂorÂdirectory
 >>>>Âgrep:Â/root/bosco-test/boscotest.2DBlK/logfile:ÂNoÂsuchÂfileÂorÂdirectory
 >>>>ÂThenÂfailed.
 >>>>
 >>>>
 >>>>ÂYifeiÂLi
 >>>>
 >>>>
 >>>>
 >>>>
 >>>>
 >>>>
 >>>>
 >>>>
 >>>>
 >>>>ÂÂÂ_______________________________________________
 >>>>ÂHTCondor-usersÂmailingÂlist
 >>>>ÂToÂunsubscribe,ÂsendÂaÂmessageÂtoÂhtcondor-users-request@xxxxxxxxxxxÂwithÂa
 >>>>Âsubject:ÂUnsubscribe
 >>>>ÂYouÂcanÂalsoÂunsubscribeÂbyÂvisiting
 >>>>Âhttps://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
 >>>>
 >>>>ÂTheÂarchivesÂcanÂbeÂfoundÂat:
 >>>>Âhttps://lists.cs.wisc.edu/archive/htcondor-users/
 >>>
 >>>
 >>>Â_______________________________________________
 >>>ÂHTCondor-usersÂmailingÂlist
 >>>ÂToÂunsubscribe,ÂsendÂaÂmessageÂtoÂhtcondor-users-request@xxxxxxxxxxxÂwithÂa
 >>>Âsubject:ÂUnsubscribe
 >>>ÂYouÂcanÂalsoÂunsubscribeÂbyÂvisiting
 >>>Âhttps://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
 >>>
 >>>ÂTheÂarchivesÂcanÂbeÂfoundÂat:
 >>>Âhttps://lists.cs.wisc.edu/archive/htcondor-users/
 >>Â--
 >>ÂTimÂTheisenÂ(he,Âhim,Âhis)
 >>ÂReleaseÂManager
 >>ÂHTCondorÂ&ÂOpenÂScienceÂGrid
 >>ÂCenterÂforÂHighÂThroughputÂComputing
 >>ÂDepartmentÂofÂComputerÂSciences
 >>ÂUniversityÂofÂWisconsinÂ-ÂMadison
 >>Â4261ÂComputerÂSciencesÂandÂStatistics
 >>Â1210ÂWÂDaytonÂSt
 >>ÂMadison,ÂWIÂ53706-1685
 >>Â+1Â608Â265Â5736
 >>
 >>Â_______________________________________________
 >>ÂHTCondor-usersÂmailingÂlist
 >>ÂToÂunsubscribe,ÂsendÂaÂmessageÂtoÂhtcondor-users-request@xxxxxxxxxxxÂwithÂa
 >>Âsubject:ÂUnsubscribe
 >>ÂYouÂcanÂalsoÂunsubscribeÂbyÂvisiting
 >>Âhttps://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
 >>
 >>ÂTheÂarchivesÂcanÂbeÂfoundÂat:
 >>Âhttps://lists.cs.wisc.edu/archive/htcondor-users/
 >Â--
 >ÂTimÂTheisenÂ(he,Âhim,Âhis)
 >ÂReleaseÂManager
 >ÂHTCondorÂ&ÂOpenÂScienceÂGrid
 >ÂCenterÂforÂHighÂThroughputÂComputing
 >ÂDepartmentÂofÂComputerÂSciences
 >ÂUniversityÂofÂWisconsinÂ-ÂMadison
 >Â4261ÂComputerÂSciencesÂandÂStatistics
 >Â1210ÂWÂDaytonÂSt
 >ÂMadison,ÂWIÂ53706-1685
 >Â+1Â608Â265Â5736
 >Â_______________________________________________
 >ÂHTCondor-usersÂmailingÂlist
 >ÂToÂunsubscribe,ÂsendÂaÂmessageÂtoÂhtcondor-users-request@xxxxxxxxxxxÂwithÂa
 >Âsubject:ÂUnsubscribe
 >ÂYouÂcanÂalsoÂunsubscribeÂbyÂvisiting
 >Âhttps://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
 >
 >ÂTheÂarchivesÂcanÂbeÂfoundÂat:
 >Âhttps://lists.cs.wisc.edu/archive/htcondor-users/



_______________________________________________
HTCondor-usersÂmailingÂlist
ToÂunsubscribe,ÂsendÂaÂmessageÂtoÂhtcondor-users-request@xxxxxxxxxxxÂwithÂa
subject:ÂUnsubscribe
YouÂcanÂalsoÂunsubscribeÂbyÂvisiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

TheÂarchivesÂcanÂbeÂfoundÂat:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature