[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] mpich2 error " '.../condor_exec.exe' witharguments hellow.exe: No such file or directory"



Dear Mark,

I do some similar try as followed, but again the mpi job does not run. 

I add what you say to condor_config file, instead of condor_config.local, in
addition I create 4 account with the name condor_user1,... condor_user4,
Then restart, and submit similar job. 

Moreover I another strange thing happened. I am login with condor account
and submit the job. But when the job exited, my account make logoff! And I
have to login again to see the result!
 
Regard,
Asrah

-----Original Message-----
From: arash [mailto:anoorghorbani@xxxxxxxxx] 
Sent: Thursday, February 07, 2008 11:34 AM
To: 'Condor-Users Mail List'
Subject: RE: [Condor-users] mpich2 error " '.../condor_exec.exe'
witharguments hellow.exe: No such file or directory"

Dear Mark,

Thanks for your answer,

I am so sorry for my preliminary question, but I don't understand exactly
what I have to do.

As you said, I added the 5 lines (VM1_USERS ... ), exactly at the end of
condor_config.local of machine mpi0, which is my dedicated scheduler ,and
furthermore the job are executed on it. 
Then restart the machine and submit again the same job. 
But I have no success and in addition I received a new error file with file
name sshd.out, Contained :

Disabling protocol version 1. Could not load host key Server listening on ::
port 4445.
Bind to port 4445 on 0.0.0.0 failed: Address already in use.

I attached all related files.
It would be appreciated if you help me to fix this problem.

Regard,
Arash

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Mark Calleja
Sent: Wednesday, February 06, 2008 11:45 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] mpich2 error " '.../condor_exec.exe'
witharguments hellow.exe: No such file or directory"

For the record, we use dedicated users for each vm(pre 6.9)/slot(post 6.9),
so for a four core machine with the default setting of four slots we'd have
the following in that execute machine's condor_config.local (using 6.8
notation):

VM1_USER                   = condor_user1
VM2_USER                   = condor_user2
VM3_USER                   = condor_user3
VM4_USER                   = condor_user4
EXECUTE_LOGIN_IS_DEDICATED = TRUE

Each account has a home directory, like an ordinary user account.

Hope this helps,
Mark

--
Cambridge eScience Centre, University of Cambridge Centre for Mathematical
Sciences, Wilberforce Road, Cambridge CB3 0WA Tel. (+44/0) 1223 765317, Fax
(+44/0) 1223 765900 http://www.escience.cam.ac.uk/~mcal00

Ben Burnett wrote:
> Hi Arash:
>
> It may be that you are getting an error when the script tries to 
> create the loclocloc file in the current user's home directory.  If 
> the job is run as nobody, then there is no home directory (or, 
> alternatively, may not have access to it).  As for the "bad number" 
> error, it seems that the script is comparing a string "hellow.exe" to 0
using an arithmetic comparison, which is invalid.
>
> -B
>
>
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx 
> [mailto:condor-users-bounces@xxxxxxxxxxx]
> On Behalf Of arash
> Sent: Tuesday, February 05, 2008 9:48 AM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] mpich2 error " '.../condor_exec.exe' 
> witharguments
> hellow.exe: No such file or directory"
>
>
> Dear All
> I am so sorry about forgetting to attach related files.
> It is all of the file
>
> Best wish,
> Arash
> -----Original Message-----
> From: arash [mailto:anoorghorbani@xxxxxxxxx]
> Sent: Tuesday, February 05, 2008 7:01 PM
> To: 'Condor-Users Mail List'
> Subject: RE: [Condor-users] mpich2 error " '.../condor_exec.exe'
> witharguments hellow.exe: No such file or directory"
>
> Thanks for your consideration,
>
> I add this line but I get the same result.
> Moreover I have another error in my configuration, I had called condor 
> start twice in my startup of Linux, after fixing that it seems that 
> the job run, but I have no output, and additionally I receive very similar
error files.
>
> Again , I attached all of the related files.
>
> I think there is an error in Mark Calleja's mp2script, or I am using 
> this file wrongly.
> In particular at the end of my error files you can see:
>
> ___________________________________________________
>
> + hostname=mpi0
> + pwd
> + currentDir=/home/condor/execute/dir_6717
> + whoami
> + user=condor
> + echo hellow.exe mpi0 4446 condor /home/condor/execute/dir_6717 
> + /usr/local/condor/libexec/condor_chirp put -mode cwa -
> /home/condor/spool/cluster41.proc0.subproc0/contact
> + [ 0 -ne 0 ]
> + [ hellow.exe -eq 0 ]
> [: 1: hellow.exe: bad number
> + EXECUTABLE=hellow.exe
> + shift
> + chmod +x hellow.exe
> + MPDIR=/usr/local/mpich2
> +
> PATH=/usr/local/mpich2/bin:.:/usr/local/condor/bin:/sbin:/bin:/usr/sbi
> n:/usr
> /bin
> + export PATH
> + export SCRATCH_LOC=loclocloc
> /home/condor/execute/dir_6717/condor_exec.exe: 39: cannot create
> ~/loclocloc: Directory nonexistent
> + echo /home/condor/execute/dir_6717
> + trap finalize TERM
> + [ hellow.exe -ne 0 ]
> [: 1: hellow.exe: bad number
> + [ hellow.exe -eq 0 ]
> [: 1: hellow.exe: bad number
> + exit 0
>
> ___________________________________________________
>
>
> I don't know what is loclocloc and also I am confusing about the 
> meaning of
>
>
> [: 1: hellow.exe: bad number
>
> Again Thanks for your consideration,
>
> Regard,
> Arash
>
>    
>
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx 
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/
>   
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/