[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MPI job problem



Hi Mark and Greg
Thanks for your responses

I change the START attribute from Scheduler =?= $(DedicatedScheduler) to True
in pragma002 and pragma004 local configuraion file and indeed , the status 
become "Unclaimed"
------------------------------------------------------------------------
[lyho@pragma001 lyho]$ condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   
ActvtyTime

pragma001.gri LINUX       INTEL  Owner      Idle       0.010   469  
0+00:10:04
pragma002.gri LINUX       INTEL  Unclaimed  Idle       0.290   469  
0+03:21:02
pragma004.gri LINUX       INTEL  Unclaimed  Idle       0.150  1004  
0+03:19:48

                     Machines Owner Claimed Unclaimed Matched Preempting

         INTEL/LINUX        3     1       0         2       0          0

               Total        3     1       0         2       0          0

-------------------------------------------------------------------------

but the job still IDLE

-------------------------------------------------------------------------
[lyho@pragma001 lyho]$ condor_q


-- Submitter: pragma001.grid.sinica.edu.tw : <140.109.98.21:33670> : 
pragma001.g
rid.sinica.edu.tw
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
 140.0   lyho            4/29 17:44   0+00:00:00 I  0   0.3  cpi

1 jobs; 1 idle, 0 running, 0 held

------------------------------------------------------------------------

and then I test the vanilla job
the job description file :
============================
universe = vanilla
executable = cpi
log = logofcpi.new
error = errofcpi.$(NODE).new
output = outofcpi.$(NODE).new
queue
=============================

and it can be done

------------------------------------------------------------------------
[lyho@pragma001 condor_test]$ condor_q


-- Submitter: pragma001.grid.sinica.edu.tw : <140.109.98.21:33670> : 
pragma001.g
rid.sinica.edu.tw
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
 142.0   lyho            5/2  13:18   0+00:00:00 R  0   0.3  cpi

1 jobs; 0 idle, 1 running, 0 held
---------------------------------------------------------------------

The files of log, error and output

---------------------------------------------------------------------
[lyho@pragma001 condor_test]$ more *.new
::::::::::::::
errofcpi..new
::::::::::::::
Process 0 on pragma002.grid.sinica.edu.tw
::::::::::::::
logofcpi.new
::::::::::::::
000 (142.000.000) 05/02 13:18:57 Job submitted from host: 
<140.109.98.21:33670>
...
001 (142.000.000) 05/02 13:19:00 Job executing on host: <140.109.98.22:48852>
...
005 (142.000.000) 05/02 13:19:00 Job terminated.
        (1) Normal termination (return value 0)
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
        0  -  Total Bytes Sent By Job
        0  -  Total Bytes Received By Job
...
::::::::::::::
outofcpi..new
::::::::::::::
pi is approximately 3.1416009869231254, Error is 0.0000083333333323
wall clock time = 0.000055

--------------------------------------------------------------------

So, someting wrong with mpi job

Can anyone help me ??



On Fri, 29 Apr 2005 12:11:53 +0300, Mark Silberstein wrote
> The problem seems to be in the fact that all your computers are in 
> the "Owner" state, i.e. Condor is NOT allowed to start any job on them.
> Obviously you're using the START expression (in the condor_config),
> which makes your resources reject Condor jobs when they are under 
> load or when there's some  keyboard activity. ( the output you sent was
> produced on pragma001, so you were working on it, and two others 
> have a load average of 1.000 ) . To TEST that MPI really works you 
> might want to disable this, by putting START=TRUE ( which would 
> allow any job to be invoked, regardless of the current computer 
> activity), or START=($(START))||((Scheduler =?= $(DedicatedScheduler)
> ). Mark
>