[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] mpich2 error " '.../condor_exec.exe' witharguments hellow.exe: No such file or directory"



Dear All
I am so sorry about forgetting to attach related files.
It is all of the file

Best wish,
Arash 
-----Original Message-----
From: arash [mailto:anoorghorbani@xxxxxxxxx] 
Sent: Tuesday, February 05, 2008 7:01 PM
To: 'Condor-Users Mail List'
Subject: RE: [Condor-users] mpich2 error " '.../condor_exec.exe'
witharguments hellow.exe: No such file or directory"

Thanks for your consideration, 

I add this line but I get the same result.
Moreover I have another error in my configuration, I had called condor start
twice in my startup of Linux, after fixing that it seems that the job run,
but I have no output, and additionally I receive very similar error files.

Again , I attached all of the related files.

I think there is an error in Mark Calleja's mp2script, or I am using this
file wrongly.  
In particular at the end of my error files you can see:

___________________________________________________

+ hostname=mpi0
+ pwd
+ currentDir=/home/condor/execute/dir_6717
+ whoami
+ user=condor
+ echo hellow.exe mpi0 4446 condor /home/condor/execute/dir_6717
+ /usr/local/condor/libexec/condor_chirp put -mode cwa -
/home/condor/spool/cluster41.proc0.subproc0/contact
+ [ 0 -ne 0 ]
+ [ hellow.exe -eq 0 ]
[: 1: hellow.exe: bad number
+ EXECUTABLE=hellow.exe
+ shift
+ chmod +x hellow.exe
+ MPDIR=/usr/local/mpich2
+
PATH=/usr/local/mpich2/bin:.:/usr/local/condor/bin:/sbin:/bin:/usr/sbin:/usr
/bin
+ export PATH
+ export SCRATCH_LOC=loclocloc
/home/condor/execute/dir_6717/condor_exec.exe: 39: cannot create
~/loclocloc: Directory nonexistent
+ echo /home/condor/execute/dir_6717
+ trap finalize TERM
+ [ hellow.exe -ne 0 ]
[: 1: hellow.exe: bad number
+ [ hellow.exe -eq 0 ]
[: 1: hellow.exe: bad number
+ exit 0

___________________________________________________


I don't know what is loclocloc and also I am confusing about the meaning of


[: 1: hellow.exe: bad number

Again Thanks for your consideration, 

Regard,
Arash

   

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Kewley, J (John)
Sent: Monday, February 04, 2008 5:53 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] mpich2 error " '.../condor_exec.exe'
witharguments hellow.exe: No such file or directory"

Do you not need to add
transfer_executable=true 
so that your "executable" (mp2script.smp) is transferred?

(I haven't used parallel universe, but that error is common for this error
in
other universes and I noticed you were transferring other files, hence not
in a shared
filestore environment)

Cheers

JK

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx on behalf of arash
Sent: Mon 04/02/2008 14:16
To: Condor-Users Mail List
Subject: Re: [Condor-users] mpich2 error " '.../condor_exec.exe'
witharguments hellow.exe: No such file or directory"
 
Dear All,

 

To continuing of my last reported problem, I change mp2script to the new one
which are attached.

After a few minute, the submitted job was exited with the attached error and
log file.

 

Thanks for your consideration.

 

Sincerely,

Arash

 

 

 

From: arash [mailto:anoorghorbani@xxxxxxxxx] 
Sent: Monday, February 04, 2008 5:39 PM
To: Condor-Users Mail List
Subject: RE: mpich2 error " '.../condor_exec.exe' with arguments hellow.exe:
No such file or directory"

 

Dear All ,

 

Regard to my first e-mail with subject "mpich2 error " '.../condor_exec.exe'
with arguments hellow.exe: No such file or directory", I attached  the
correspond parts of all of my log files, may be useful. And please note that
I use ubuntu 7.10. 

 

Regard,

Arash 

 

From: arash [mailto:anoorghorbani@xxxxxxxxx] 
Sent: Sunday, February 03, 2008 2:38 PM
To: Condor-Users Mail List
Subject: mpich2 error " '.../condor_exec.exe' with arguments hellow.exe: No
such file or directory"

 

Dear All,

 

I was configured to quad-core computers (called mpi0 and mpi1) as dedicated
resources , which mpi0 are set as scheduler. however I can run simple
parallel jobs, but I couldn't run mpi jobs. 

 

And I received the error :

 

'/home/condor/execute/dir_6618/condor_exec.exe' with arguments hellow.exe:
No such file or directory

 

In the file        log.#pArAlLeLnOdE#

 

 

I submitted the following file:

#########################################

universe              = parallel

executable         = mp2script.smp

arguments          = hellow.exe

machine_count                = 3

should_transfer_files    = yes

when_to_transfer_output          = on_exit

transfer_input_files       = hellow.exe

+WantParallelSchedulingGroups               = False

notification         =never

log          =log.$(NODE)

error      =err.$(NODE)

output  =out.$(NODE)

queue

#########################################

 

Which hellow.exe is mpicc of 

 

*****************************************

/* -*- Mode: C; c-basic-offset:4 ; -*- */

/*

 *  (C) 2001 by Argonne National Laboratory.

 *      See COPYRIGHT in top-level directory.

 */

 

#include <stdio.h>

#include "mpi.h"

 

int main( int argc, char *argv[] )

{

    int rank;

    int size;

    

    MPI_Init( 0, 0 );

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    MPI_Comm_size(MPI_COMM_WORLD, &size);

    printf( "Hello world from process %d of %d\n", rank, size );

    MPI_Finalize();

    return 0;

}

 

 

 

And I used mp2script described in CamGrid page  which is:

 

#!/bin/sh

#

# File: mp2script.smp

#

#   Edit MPDIR and LD_LIBRARY_PATH to suit your

# local configuration.

 

_CONDOR_PROCNO=$_CONDOR_PROCNO

_CONDOR_NPROCS=$_CONDOR_NPROCS

 

EXECUTABLE=$1

shift

 

# the binary is copied but the executable flag is cleared.

# so the script have to take care of this

chmod +x $EXECUTABLE

 

# Set this to the bin directory of your mpich2 installation

MPDIR=/usr/local/mpich2

PATH=$MPDIR/bin:.:$PATH

export PATH

 

# When a job is killed by the user, this script will get sigterm

# This script has to catch it and do the cleaning for the

# mpich2 environment

finalize()

{

  mpdallexit

  exit

}

trap finalize TERM

 

# start the mpich2 environment

if [ $_CONDOR_PROCNO -eq 0 ]

then

                # MPICH2 requires an mpd.conf file with a

        # password in it on the host starting the job.

        # We'll generate one on the fly, though we could

                # use a pre-prepared one, e.g:

                # export MPD_CONF_FILE=~/.mpd.conf

 

        export MPD_CONF_FILE=`pwd`/mpd.conf

        echo "secretword=MySecretWord" > $MPD_CONF_FILE

        chmod 600 $MPD_CONF_FILE

 

                # Adjust the following to your needs. I use Intel

                # compilers to build MPICH2

        export LD_LIBRARY_PATH=/lib:/usr/lib:/$MPDIR/lib

 

                mpd --daemon --debug

        val=$?

 

                if [ $val -ne 0 ]

                then

                                echo "mp2script error booting mpd: $val"

                                exit 1

                fi

 

                ## Run the actual mpi job. Note pre-prepared machine file.

                mpiexec -l -machinefile $MPDIR/etc/machfile -envall -n
$_CONDOR_NPROCS $EXECUTABLE $@ 

        mpdallexit

                rm $MPD_CONF_FILE

else

                wait

                exit 0     

fi

 

exit $?

 

###### End of mp2script.smp ######

 

 

 

And the file    log.#pArAlLeLnOdE#  was generated as followed:

 

000 (030.000.000) 02/02 17:28:00 Job submitted from host: <x.x.x.27:54299>

...

014 (030.000.000) 02/02 17:33:04 Node 0 executing on host: <x.x.x.27:39023>

...

014 (030.000.001) 02/02 17:33:04 Node 1 executing on host: <x.x.x.27:39023>

...

014 (030.000.002) 02/02 17:33:04 Node 2 executing on host: <x.x.x.27:39023>

...

001 (030.000.000) 02/02 17:33:04 Job executing on host: MPI_job

...

007 (030.000.000) 02/02 17:33:04 Shadow exception!

                Error from starter on vm3@xxxxxxxxxx: Failed to execute
'/home/condor/execute/dir_6618/condor_exec.exe' with arguments hellow.exe:
No such file or directory

                0  -  Run Bytes Sent By Job

                1621935  -  Run Bytes Received By Job

...

012 (030.000.000) 02/02 17:33:04 Job was held.

                Error from starter on vm3@xxxxxxxxxx: Failed to execute
'/home/condor/execute/dir_6618/condor_exec.exe' with arguments hellow.exe:
No such file or directory

                Code 6 Subcode 2

...

 

 

 

I will be pleasured if you have any hint.

 

Regard,

Arash

 



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/

Attachment: simple_mpi.rar
Description: Binary data