[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] mpich2 error " '.../condor_exec.exe' witharguments hellow.exe: No such file or directory"



Dear Mark,

Thanks for your answer,

I am so sorry for my preliminary question, but I don't understand exactly
what I have to do.

As you said, I added the 5 lines (VM1_USERS ... ), exactly at the end of
condor_config.local of machine mpi0, 
which is my dedicated scheduler ,and furthermore the job are executed on it.

Then restart the machine and submit again the same job. 
But I have no success and in addition I received a new error file with file
name sshd.out,
Contained :

Disabling protocol version 1. Could not load host key
Server listening on :: port 4445.
Bind to port 4445 on 0.0.0.0 failed: Address already in use.

I attached all related files.
It would be appreciated if you help me to fix this problem.

Regard,
Arash

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Mark Calleja
Sent: Wednesday, February 06, 2008 11:45 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] mpich2 error " '.../condor_exec.exe'
witharguments hellow.exe: No such file or directory"

For the record, we use dedicated users for each vm(pre 6.9)/slot(post 
6.9), so for a four core machine with the default setting of four slots 
we'd have the following in that execute machine's condor_config.local 
(using 6.8 notation):

VM1_USER                   = condor_user1
VM2_USER                   = condor_user2
VM3_USER                   = condor_user3
VM4_USER                   = condor_user4
EXECUTE_LOGIN_IS_DEDICATED = TRUE

Each account has a home directory, like an ordinary user account.

Hope this helps,
Mark

-- 
Cambridge eScience Centre, University of Cambridge
Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WA
Tel. (+44/0) 1223 765317, Fax  (+44/0) 1223 765900
http://www.escience.cam.ac.uk/~mcal00

Ben Burnett wrote:
> Hi Arash:
>
> It may be that you are getting an error when the script tries to create
the
> loclocloc file in the current user's home directory.  If the job is run as
> nobody, then there is no home directory (or, alternatively, may not have
access
> to it).  As for the "bad number" error, it seems that the script is
comparing a
> string "hellow.exe" to 0 using an arithmetic comparison, which is invalid.
>
> -B
>
>
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx]
> On Behalf Of arash
> Sent: Tuesday, February 05, 2008 9:48 AM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] mpich2 error " '.../condor_exec.exe'
witharguments
> hellow.exe: No such file or directory"
>
>
> Dear All
> I am so sorry about forgetting to attach related files.
> It is all of the file
>
> Best wish,
> Arash
> -----Original Message-----
> From: arash [mailto:anoorghorbani@xxxxxxxxx]
> Sent: Tuesday, February 05, 2008 7:01 PM
> To: 'Condor-Users Mail List'
> Subject: RE: [Condor-users] mpich2 error " '.../condor_exec.exe'
> witharguments hellow.exe: No such file or directory"
>
> Thanks for your consideration, 
>
> I add this line but I get the same result.
> Moreover I have another error in my configuration, I had called condor
start
> twice in my startup of Linux, after fixing that it seems that the job run,
but I
> have no output, and additionally I receive very similar error files.
>
> Again , I attached all of the related files.
>
> I think there is an error in Mark Calleja's mp2script, or I am using this
file
> wrongly.  
> In particular at the end of my error files you can see:
>
> ___________________________________________________
>
> + hostname=mpi0
> + pwd
> + currentDir=/home/condor/execute/dir_6717
> + whoami
> + user=condor
> + echo hellow.exe mpi0 4446 condor /home/condor/execute/dir_6717 
> + /usr/local/condor/libexec/condor_chirp put -mode cwa -
> /home/condor/spool/cluster41.proc0.subproc0/contact
> + [ 0 -ne 0 ]
> + [ hellow.exe -eq 0 ]
> [: 1: hellow.exe: bad number
> + EXECUTABLE=hellow.exe
> + shift
> + chmod +x hellow.exe
> + MPDIR=/usr/local/mpich2
> +
>
PATH=/usr/local/mpich2/bin:.:/usr/local/condor/bin:/sbin:/bin:/usr/sbin:/usr
> /bin
> + export PATH
> + export SCRATCH_LOC=loclocloc
> /home/condor/execute/dir_6717/condor_exec.exe: 39: cannot create
> ~/loclocloc: Directory nonexistent
> + echo /home/condor/execute/dir_6717
> + trap finalize TERM
> + [ hellow.exe -ne 0 ]
> [: 1: hellow.exe: bad number
> + [ hellow.exe -eq 0 ]
> [: 1: hellow.exe: bad number
> + exit 0
>
> ___________________________________________________
>
>
> I don't know what is loclocloc and also I am confusing about the meaning
of
>
>
> [: 1: hellow.exe: bad number
>
> Again Thanks for your consideration, 
>
> Regard,
> Arash
>
>    
>
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/
>   
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/

Attachment: mpi_16.rar
Description: Binary data