Dear Mark, Thanks for your answer, I am so sorry for my preliminary question, but I don't understand exactly what I have to do. As you said, I added the 5 lines (VM1_USERS ... ), exactly at the end of condor_config.local of machine mpi0, which is my dedicated scheduler ,and furthermore the job are executed on it. Then restart the machine and submit again the same job. But I have no success and in addition I received a new error file with file name sshd.out, Contained : Disabling protocol version 1. Could not load host key Server listening on :: port 4445. Bind to port 4445 on 0.0.0.0 failed: Address already in use. I attached all related files. It would be appreciated if you help me to fix this problem. Regard, Arash -----Original Message----- From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Mark Calleja Sent: Wednesday, February 06, 2008 11:45 AM To: Condor-Users Mail List Subject: Re: [Condor-users] mpich2 error " '.../condor_exec.exe' witharguments hellow.exe: No such file or directory" For the record, we use dedicated users for each vm(pre 6.9)/slot(post 6.9), so for a four core machine with the default setting of four slots we'd have the following in that execute machine's condor_config.local (using 6.8 notation): VM1_USER = condor_user1 VM2_USER = condor_user2 VM3_USER = condor_user3 VM4_USER = condor_user4 EXECUTE_LOGIN_IS_DEDICATED = TRUE Each account has a home directory, like an ordinary user account. Hope this helps, Mark -- Cambridge eScience Centre, University of Cambridge Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WA Tel. (+44/0) 1223 765317, Fax (+44/0) 1223 765900 http://www.escience.cam.ac.uk/~mcal00 Ben Burnett wrote: > Hi Arash: > > It may be that you are getting an error when the script tries to create the > loclocloc file in the current user's home directory. If the job is run as > nobody, then there is no home directory (or, alternatively, may not have access > to it). As for the "bad number" error, it seems that the script is comparing a > string "hellow.exe" to 0 using an arithmetic comparison, which is invalid. > > -B > > > -----Original Message----- > From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] > On Behalf Of arash > Sent: Tuesday, February 05, 2008 9:48 AM > To: Condor-Users Mail List > Subject: Re: [Condor-users] mpich2 error " '.../condor_exec.exe' witharguments > hellow.exe: No such file or directory" > > > Dear All > I am so sorry about forgetting to attach related files. > It is all of the file > > Best wish, > Arash > -----Original Message----- > From: arash [mailto:anoorghorbani@xxxxxxxxx] > Sent: Tuesday, February 05, 2008 7:01 PM > To: 'Condor-Users Mail List' > Subject: RE: [Condor-users] mpich2 error " '.../condor_exec.exe' > witharguments hellow.exe: No such file or directory" > > Thanks for your consideration, > > I add this line but I get the same result. > Moreover I have another error in my configuration, I had called condor start > twice in my startup of Linux, after fixing that it seems that the job run, but I > have no output, and additionally I receive very similar error files. > > Again , I attached all of the related files. > > I think there is an error in Mark Calleja's mp2script, or I am using this file > wrongly. > In particular at the end of my error files you can see: > > ___________________________________________________ > > + hostname=mpi0 > + pwd > + currentDir=/home/condor/execute/dir_6717 > + whoami > + user=condor > + echo hellow.exe mpi0 4446 condor /home/condor/execute/dir_6717 > + /usr/local/condor/libexec/condor_chirp put -mode cwa - > /home/condor/spool/cluster41.proc0.subproc0/contact > + [ 0 -ne 0 ] > + [ hellow.exe -eq 0 ] > [: 1: hellow.exe: bad number > + EXECUTABLE=hellow.exe > + shift > + chmod +x hellow.exe > + MPDIR=/usr/local/mpich2 > + > PATH=/usr/local/mpich2/bin:.:/usr/local/condor/bin:/sbin:/bin:/usr/sbin:/usr > /bin > + export PATH > + export SCRATCH_LOC=loclocloc > /home/condor/execute/dir_6717/condor_exec.exe: 39: cannot create > ~/loclocloc: Directory nonexistent > + echo /home/condor/execute/dir_6717 > + trap finalize TERM > + [ hellow.exe -ne 0 ] > [: 1: hellow.exe: bad number > + [ hellow.exe -eq 0 ] > [: 1: hellow.exe: bad number > + exit 0 > > ___________________________________________________ > > > I don't know what is loclocloc and also I am confusing about the meaning of > > > [: 1: hellow.exe: bad number > > Again Thanks for your consideration, > > Regard, > Arash > > > > > > _______________________________________________ > Condor-users mailing list > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/condor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/condor-users/ > _______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/
Attachment:
mpi_16.rar
Description: Binary data