[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MPI job problem



Dear Greg
Of course and thanks for your help
This is the SchedLog of pragma001.grid.sinica.edu.tw 
and there is nothing in startdlog 
------------------------------------------------------------------------
5/3 09:04:01 -------- Begin starting jobs --------
5/3 09:04:01 -------- Done starting jobs --------
5/3 09:04:02 JobsRunning = 0
5/3 09:04:02 JobsIdle = 0
5/3 09:04:02 JobsHeld = 0
5/3 09:04:02 JobsRemoved = 0
5/3 09:04:02 SchedUniverseJobsRunning = 0
5/3 09:04:02 SchedUniverseJobsIdle = 0
5/3 09:04:02 N_Owners = 0
5/3 09:04:02 MaxJobsRunning = 200
5/3 09:04:02 Attempting to send update via UDP to collector 
pragma001.grid.sinic
a.edu.tw <140.109.98.21:9618>
5/3 09:04:02 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
5/3 09:04:02 Sent HEART BEAT ad to central mgr: Number of submittors=0
5/3 09:04:02 Attempting to send update via UDP to collector marlin.bii.a-
star.ed
u.sg <202.6.243.157:9618>
5/3 09:04:02 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
5/3 09:04:02 ============ Begin clean_shadow_recs =============
5/3 09:04:02 ============ End clean_shadow_recs =============
5/3 09:06:28 DaemonCore: Command received via TCP from host 
<140.109.98.21:44215
>
5/3 09:06:28 DaemonCore: received command 1111 (QMGMT_CMD), calling handler 
(han
dle_q)
5/3 09:06:28 condor_read(): Socket closed when trying to read buffer
5/3 09:06:28 QMGR Connection closed
5/3 09:07:35 DaemonCore: Command received via TCP from host 
<140.109.98.21:44245
>
5/3 09:07:35 DaemonCore: received command 1111 (QMGMT_CMD), calling handler 
(han
dle_q)
5/3 09:07:35 AUTHENTICATE_FS: used file /tmp/qmgr_6LKOTY, status: 1
5/3 09:07:35 OwnerCheck retval 1 (success), super_user
5/3 09:07:35 OwnerCheck retval 1 (success), super_user
5/3 09:07:36 wrote 300788 bytes
5/3 09:07:36 done with transfer, errno = 0
5/3 09:07:36 condor_read(): Socket closed when trying to read buffer
5/3 09:07:36 QMGR Connection closed
5/3 09:07:36 DaemonCore: Command received via TCP from host 
<140.109.98.21:44256
>
5/3 09:07:36 DaemonCore: received command 464 (ATTEMPT_ACCESS), calling 
handler
(attempt_access_handler)
5/3 09:07:36 ATTEMPT_ACCESS: Switching to user uid: 510 gid: 510.
5/3 09:07:36 Checking 
file /home/lyho/test/examples/condor_test/outofcpi.0.new f
or write permission.
5/3 09:07:36 Switching back to old priv state.
5/3 09:07:36 DaemonCore: Command received via TCP from host 
<140.109.98.21:44257
>
5/3 09:07:36 DaemonCore: received command 464 (ATTEMPT_ACCESS), calling 
handler
(attempt_access_handler)
5/3 09:07:36 ATTEMPT_ACCESS: Switching to user uid: 510 gid: 510.
5/3 09:07:36 Checking 
file /home/lyho/test/examples/condor_test/errofcpi.0.new f
or write permission.
5/3 09:07:36 Switching back to old priv state.
5/3 09:07:36 Found idle MPI cluster 143
5/3 09:07:36 Started timer (1035) to call handleDedicatedJobs() in 2 secs
5/3 09:07:36 JobsRunning = 0
5/3 09:07:36 JobsIdle = 0
5/3 09:07:36 JobsHeld = 0
5/3 09:07:36 JobsRemoved = 0
5/3 09:07:36 SchedUniverseJobsRunning = 0
5/3 09:07:36 SchedUniverseJobsIdle = 0
5/3 09:07:36 N_Owners = 1
5/3 09:07:36 MaxJobsRunning = 200
5/3 09:07:36 Attempting to send update via UDP to collector 
pragma001.grid.sinic
a.edu.tw <140.109.98.21:9618>
5/3 09:07:36 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
5/3 09:07:36 Sent HEART BEAT ad to central mgr: Number of submittors=1
5/3 09:07:36 Attempting to send update via UDP to collector marlin.bii.a-
star.ed
u.sg <202.6.243.157:9618>
5/3 09:07:36 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
5/3 09:07:36 Changed attribute: RunningJobs = 0
5/3 09:07:36 Changed attribute: IdleJobs = 0
5/3 09:07:36 Changed attribute: HeldJobs = 0
5/3 09:07:36 Changed attribute: FlockedJobs = 0
5/3 09:07:36 Changed attribute: Name = "lyho@xxxxxxxxxxxxxxxxxx"
5/3 09:07:36 Attempting to send update via UDP to collector 
pragma001.grid.sinic
a.edu.tw <140.109.98.21:9618>
5/3 09:07:36 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
5/3 09:07:36 Sent ad to central manager for lyho@xxxxxxxxxxxxxxxxxx
5/3 09:07:36 ============ Begin clean_shadow_recs =============
5/3 09:07:36 ============ End clean_shadow_recs =============
5/3 09:07:36 Called reschedule_negotiator()
5/3 09:07:36 Sending RESCHEDULE command to negotiator(s)
5/3 09:07:36 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
5/3 09:07:36 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
5/3 09:07:38 Starting DedicatedScheduler::handleDedicatedJobs
5/3 09:07:38 Found 1 idle dedicated job(s)
5/3 09:07:38 DedicatedScheduler: Listing all dedicated jobs -
5/3 09:07:38 Dedicated job: 143.0 lyho
5/3 09:07:38 SCHEDD_TIMEOUT_MULTIPLIER is undefined, using default value of 0
5/3 09:07:38 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
5/3 09:07:38 Found 0 potential dedicated resources
5/3 09:07:38 Displaying dedicated resources:
5/3 09:07:38  No resources claimed
5/3 09:07:38 In DedicatedScheduler::publishRequestAd()
5/3 09:07:38 Attempting to send update via UDP to collector 
pragma001.grid.sinic
a.edu.tw <140.109.98.21:9618>
5/3 09:07:38 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
5/3 09:07:38 Finished DedicatedScheduler::handleDedicatedJobs
5/3 09:07:38 DaemonCore: Command received via TCP from host 
<140.109.98.21:44271
>
5/3 09:07:38 DaemonCore: received command 1111 (QMGMT_CMD), calling handler 
(han
dle_q)
5/3 09:07:38 condor_read(): Socket closed when trying to read buffer
5/3 09:07:38 QMGR Connection closed
5/3 09:07:39 DaemonCore: Command received via TCP from host 
<140.109.98.21:44284
>
5/3 09:07:39 DaemonCore: received command 1111 (QMGMT_CMD), calling handler 
(han
dle_q)
5/3 09:07:39 condor_read(): Socket closed when trying to read buffer
5/3 09:07:39 QMGR Connection closed
5/3 09:07:40 DaemonCore: Command received via TCP from host 
<140.109.98.21:44297
>
5/3 09:07:40 DaemonCore: received command 1111 (QMGMT_CMD), calling handler 
(han
dle_q)
5/3 09:07:40 condor_read(): Socket closed when trying to read buffer
5/3 09:07:40 QMGR Connection closed
---------------------------------------------------------------------------
job status :

---------------------------------------------------------------------------
[lyho@pragma001 log]$ condor_q


-- Submitter: pragma001.grid.sinica.edu.tw : <140.109.98.21:33670> : 
pragma001.g
rid.sinica.edu.tw
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
 143.0   lyho            5/3  09:07   0+00:00:00 I  0   0.3  cpi

1 jobs; 1 idle, 0 running, 0 held
---------------------------------------------------------------------------

[lyho@pragma001 log]$ condor_q -l


-- Submitter: pragma001.grid.sinica.edu.tw : <140.109.98.21:33670> : 
pragma001.g
rid.sinica.edu.tw
MyType = "Job"
TargetType = "Machine"
ClusterId = 143
QDate = 1115082455
CompletionDate = 0
Owner = "lyho"
RemoteWallClockTime = 0.000000
LocalUserCpu = 0.000000
LocalSysCpu = 0.000000
RemoteUserCpu = 0.000000
RemoteSysCpu = 0.000000
ExitStatus = 0
NumCkpts = 0
NumRestarts = 0
NumSystemHolds = 0
CommittedTime = 0
TotalSuspensions = 0
LastSuspensionTime = 0
CumulativeSuspensionTime = 0
ExitBySignal = FALSE
CondorVersion = "$CondorVersion: 6.6.9 Mar 10 2005 $"
CondorPlatform = "$CondorPlatform: I386-LINUX_RH9 $"
RootDir = "/"
Iwd = "/home/lyho/test/examples/condor_test"
JobUniverse = 8
Cmd = "/home/lyho/test/examples/condor_test/cpi"
CurrentHosts = 0
WantRemoteSyscalls = FALSE
WantCheckpoint = FALSE
MinHosts = 2
MaxHosts = 2
JobStatus = 1
EnteredCurrentStatus = 1115082456
JobPrio = 0
User = "lyho@xxxxxxxxxxxxxxxxxx"
NiceUser = FALSE
Env = ""
JobNotification = 2
UserLog = "/home/lyho/test/examples/condor_test/logofcpi.new"
CoreSize = 0
KillSig = "SIGTERM"
Rank = 0.000000
In = "/dev/null"
TransferIn = FALSE
Out = "outofcpi.#MpInOdE#.new"
Err = "errofcpi.#MpInOdE#.new"
BufferSize = 524288
BufferBlockSize = 32768
ShouldTransferFiles = "NO"
TransferFiles = "NEVER"
ImageSize = 294
ExecutableSize = 294
DiskUsage = 294
Requirements = (Arch == "INTEL") && (OpSys == "LINUX") && (Disk >= 
DiskUsage) &&
 ((Memory * 1024) >= ImageSize) && (HasMPI) && (TARGET.FileSystemDomain == 
MY.Fi
leSystemDomain)
FileSystemDomain = "grid.sinica.edu.tw"
PeriodicHold = FALSE
PeriodicRelease = FALSE
PeriodicRemove = FALSE
OnExitHold = FALSE
OnExitRemove = TRUE
LeaveJobInQueue = FALSE
Args = ""
ProcId = 0
Scheduler = "DedicatedScheduler@lyho@pragma001.grid.sinica.edu.tw"
ServerTime = 1115083476
-------------------------------------------------------------------------
machine status:
-------------------------------------------------------------------------
[lyho@pragma001 log]$ condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   
ActvtyTime

pragma001.gri LINUX       INTEL  Owner      Idle       0.000   469  
0+00:15:04
pragma002.gri LINUX       INTEL  Unclaimed  Idle       0.890   469  
0+03:36:01
pragma004.gri LINUX       INTEL  Unclaimed  Idle       1.000  1004  
0+03:34:48

                     Machines Owner Claimed Unclaimed Matched Preempting

         INTEL/LINUX        3     1       0         2       0          0

               Total        3     1       0         2       0          0


-------------------------------------------------------------------------

[lyho@pragma001 log]$ condor_status -l
MyType = "Machine"
TargetType = "Job"
Name = "pragma001.grid.sinica.edu.tw"
Machine = "pragma001.grid.sinica.edu.tw"
Rank = 0.000000
CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000)
COLLECTOR_HOST_STRING = "pragma001.grid.sinica.edu.tw"
CondorVersion = "$CondorVersion: 6.6.9 Mar 10 2005 $"
CondorPlatform = "$CondorPlatform: I386-LINUX_RH9 $"
VirtualMachineID = 1
VirtualMemory = 940764
Disk = 58974996
CondorLoadAvg = 0.000000
LoadAvg = 0.010000
KeyboardIdle = 154
ConsoleIdle = 30616471
Memory = 469
Cpus = 1
StartdIpAddr = "<140.109.98.21:33669>"
Arch = "INTEL"
OpSys = "LINUX"
UidDomain = "grid.sinica.edu.tw"
FileSystemDomain = "grid.sinica.edu.tw"
Subnet = "140.109.98"
HasIOProxy = TRUE
TotalVirtualMemory = 940764
TotalDisk = 58974996
KFlops = 875905
Mips = 1905
LastBenchmark = 1115071434
TotalLoadAvg = 0.010000
TotalCondorLoadAvg = 0.000000
ClockMin = 568
ClockDay = 2
TotalVirtualMachines = 1
HasFileTransfer = TRUE
HasMPI = TRUE
HasJICLocalConfig = TRUE
HasJICLocalStdin = TRUE
HasPVM = TRUE
HasRemoteSyscalls = TRUE
HasCheckpointing = TRUE
StarterAbilityList 
= "HasFileTransfer,HasMPI,HasJICLocalConfig,HasJICLocalStdin,
HasPVM,HasRemoteSyscalls,HasCheckpointing"
CpuBusyTime = 0
CpuIsBusy = FALSE
State = "Owner"
EnteredCurrentState = 1115082534
Activity = "Idle"
EnteredCurrentActivity = 1115082534
Start = ((KeyboardIdle > 15 * 60) && (((LoadAvg - CondorLoadAvg) <= 
0.300000) ||
 (State != "Unclaimed" && State != "Owner")))
Requirements = START
CurrentRank = 0.000000
DaemonStartTime = 1114695432
UpdateSequenceNumber = 1297
MyAddress = "<140.109.98.21:33669>"
LastHeardFrom = 1115083738
UpdatesTotal = 1298
UpdatesSequenced = 1297
UpdatesLost = 0
UpdatesHistory = "0x00000000000000000000000000000000"

MyType = "Machine"
TargetType = "Job"
Name = "pragma002.grid.sinica.edu.tw"
Machine = "pragma002.grid.sinica.edu.tw"
Rank = Scheduler =?= "DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000)
COLLECTOR_HOST_STRING = "pragma001.grid.sinica.edu.tw"
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
CondorVersion = "$CondorVersion: 6.6.9 Mar 10 2005 $"
CondorPlatform = "$CondorPlatform: I386-LINUX_RH9 $"
VirtualMachineID = 1
VirtualMemory = 945368
Disk = 58974996
CondorLoadAvg = 0.000000
LoadAvg = 0.990000
KeyboardIdle = 44595
ConsoleIdle = 1891066
Memory = 469
Cpus = 1
StartdIpAddr = "<140.109.98.22:48852>"
Arch = "INTEL"
OpSys = "LINUX"
UidDomain = "grid.sinica.edu.tw"
FileSystemDomain = "grid.sinica.edu.tw"
Subnet = "140.109.98"
HasIOProxy = TRUE
TotalVirtualMemory = 945368
TotalDisk = 58974996
KFlops = 801365
Mips = 1880
LastBenchmark = 1115070484
TotalLoadAvg = 0.990000
TotalCondorLoadAvg = 0.000000
ClockMin = 568
ClockDay = 2
TotalVirtualMachines = 1
HasFileTransfer = TRUE
HasMPI = TRUE
HasJICLocalConfig = TRUE
HasJICLocalStdin = TRUE
HasPVM = TRUE
HasRemoteSyscalls = TRUE
HasCheckpointing = TRUE
StarterAbilityList 
= "HasFileTransfer,HasMPI,HasJICLocalConfig,HasJICLocalStdin,
HasPVM,HasRemoteSyscalls,HasCheckpointing"
CpuBusyTime = 304
CpuIsBusy = TRUE
State = "Unclaimed"
EnteredCurrentState = 1115011084
Activity = "Idle"
EnteredCurrentActivity = 1115070484
Start = TRUE
Requirements = START
CurrentRank = 0.000000
DaemonStartTime = 1114744650
UpdateSequenceNumber = 1132
MyAddress = "<140.109.98.22:48852>"
LastHeardFrom = 1115083745
UpdatesTotal = 1195
UpdatesSequenced = 1193
UpdatesLost = 0
UpdatesHistory = "0x00000000000000000000000000000000"

MyType = "Machine"
TargetType = "Job"
Name = "pragma004.grid.sinica.edu.tw"
Machine = "pragma004.grid.sinica.edu.tw"
Rank = Scheduler =?= "DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000)
COLLECTOR_HOST_STRING = "pragma001.grid.sinica.edu.tw"
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
CondorVersion = "$CondorVersion: 6.6.9 Mar 10 2005 $"
CondorPlatform = "$CondorPlatform: I386-LINUX_RH9 $"
VirtualMachineID = 1
VirtualMemory = 2009408
Disk = 58974912
CondorLoadAvg = 0.000000
LoadAvg = 1.000000
KeyboardIdle = 37227
ConsoleIdle = 30616285
Memory = 1004
Cpus = 1
StartdIpAddr = "<140.109.98.24:35849>"
Arch = "INTEL"
OpSys = "LINUX"
UidDomain = "grid.sinica.edu.tw"
FileSystemDomain = "grid.sinica.edu.tw"
Subnet = "140.109.98"
HasIOProxy = TRUE
TotalVirtualMemory = 2009408
TotalDisk = 58974912
KFlops = 575797
Mips = 1281
LastBenchmark = 1115070647
TotalLoadAvg = 1.000000
TotalCondorLoadAvg = 0.000000
ClockMin = 565
ClockDay = 2
TotalVirtualMachines = 1
HasFileTransfer = TRUE
HasMPI = TRUE
HasJICLocalConfig = TRUE
HasJICLocalStdin = TRUE
HasPVM = TRUE
HasRemoteSyscalls = TRUE
HasCheckpointing = TRUE
StarterAbilityList 
= "HasFileTransfer,HasMPI,HasJICLocalConfig,HasJICLocalStdin,
HasPVM,HasRemoteSyscalls,HasCheckpointing"
CpuBusyTime = 9305
CpuIsBusy = TRUE
State = "Unclaimed"
EnteredCurrentState = 1114767739
Activity = "Idle"
EnteredCurrentActivity = 1115070647
Start = TRUE
Requirements = START
CurrentRank = 0.000000
DaemonStartTime = 1114744768
UpdateSequenceNumber = 1130
MyAddress = "<140.109.98.24:35849>"
LastHeardFrom = 1115083535
UpdatesTotal = 1192
UpdatesSequenced = 1190
UpdatesLost = 0
UpdatesHistory = "0x00000000000000000000000000000000"
--------------------------------------------------------------------------
condor_q -analyze :

--------------------------------------------------------------------------
[lyho@pragma001 log]$ condor_q -analyze


-- Submitter: pragma001.grid.sinica.edu.tw : <140.109.98.21:33670> : 
pragma001.g
rid.sinica.edu.tw
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
---
143.000:  Run analysis summary.  Of 3 machines,
      0 are rejected by your job's requirements
      1 reject your job because of their own requirements
      0 match, but are serving users with a better priority in the pool
      2 match, match, but reject the job for unknown reasons
      0 match, but will not currently preempt their existing job
      0 are available to run your job

WARNING: Analysis is meaningless for MPI universe jobs.

1 jobs; 1 idle, 0 running, 0 held

--------------------------------------------------------------------------

really appreciate your help !

Leon


On Mon, 02 May 2005 07:59:06 -0500, Greg Thain wrote
> Can you send us the log from the schedd and the startd?
> 
> Thanks,
> 
> -greg
> 
> Li-Yung_Ho wrote:
> > Hi Mark and Greg
> > Thanks for your responses
> > 
> > I change the START attribute from Scheduler =?= $(DedicatedScheduler) to 
True
> > in pragma002 and pragma004 local configuraion file and indeed , the 
status 
> > become "Unclaimed"
> > ------------------------------------------------------------------------
> > [lyho@pragma001 lyho]$ condor_status
> > 
> > Name          OpSys       Arch   State      Activity   LoadAv Mem   
> > ActvtyTime
> > 
> > pragma001.gri LINUX       INTEL  Owner      Idle       0.010   469  
> > 0+00:10:04
> > pragma002.gri LINUX       INTEL  Unclaimed  Idle       0.290   469  
> > 0+03:21:02
> > pragma004.gri LINUX       INTEL  Unclaimed  Idle       0.150  1004  
> > 0+03:19:48
> > 
> >                      Machines Owner Claimed Unclaimed Matched Preempting
> > 
> >          INTEL/LINUX        3     1       0         2       0          0
> > 
> >                Total        3     1       0         2       0          0
> > 
> > -------------------------------------------------------------------------
> > 
> > but the job still IDLE
> > 
> > -------------------------------------------------------------------------
> > [lyho@pragma001 lyho]$ condor_q
> > 
> > 
> > -- Submitter: pragma001.grid.sinica.edu.tw : <140.109.98.21:33670> : 
> > pragma001.g
> > rid.sinica.edu.tw
> >  ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
> >  140.0   lyho            4/29 17:44   0+00:00:00 I  0   0.3  cpi
> > 
> > 1 jobs; 1 idle, 0 running, 0 held
> > 
> > ------------------------------------------------------------------------
> > 
> > and then I test the vanilla job
> > the job description file :
> > ============================
> > universe = vanilla
> > executable = cpi
> > log = logofcpi.new
> > error = errofcpi.$(NODE).new
> > output = outofcpi.$(NODE).new
> > queue
> > =============================
> > 
> > and it can be done
> > 
> > ------------------------------------------------------------------------
> > [lyho@pragma001 condor_test]$ condor_q
> > 
> > 
> > -- Submitter: pragma001.grid.sinica.edu.tw : <140.109.98.21:33670> : 
> > pragma001.g
> > rid.sinica.edu.tw
> >  ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
> >  142.0   lyho            5/2  13:18   0+00:00:00 R  0   0.3  cpi
> > 
> > 1 jobs; 0 idle, 1 running, 0 held
> > ---------------------------------------------------------------------
> > 
> > The files of log, error and output
> > 
> > ---------------------------------------------------------------------
> > [lyho@pragma001 condor_test]$ more *.new
> > ::::::::::::::
> > errofcpi..new
> > ::::::::::::::
> > Process 0 on pragma002.grid.sinica.edu.tw
> > ::::::::::::::
> > logofcpi.new
> > ::::::::::::::
> > 000 (142.000.000) 05/02 13:18:57 Job submitted from host: 
> > <140.109.98.21:33670>
> > ...
> > 001 (142.000.000) 05/02 13:19:00 Job executing on host: 
<140.109.98.22:48852>
> > ...
> > 005 (142.000.000) 05/02 13:19:00 Job terminated.
> >         (1) Normal termination (return value 0)
> >                 Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
> >                 Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
> >                 Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
> >                 Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
> >         0  -  Run Bytes Sent By Job
> >         0  -  Run Bytes Received By Job
> >         0  -  Total Bytes Sent By Job
> >         0  -  Total Bytes Received By Job
> > ...
> > ::::::::::::::
> > outofcpi..new
> > ::::::::::::::
> > pi is approximately 3.1416009869231254, Error is 0.0000083333333323
> > wall clock time = 0.000055
> > 
> > --------------------------------------------------------------------
> > 
> > So, someting wrong with mpi job
> > 
> > Can anyone help me ??
> > 
> > 
> > 
> > On Fri, 29 Apr 2005 12:11:53 +0300, Mark Silberstein wrote
> > 
> >>The problem seems to be in the fact that all your computers are in 
> >>the "Owner" state, i.e. Condor is NOT allowed to start any job on them.
> >>Obviously you're using the START expression (in the condor_config),
> >>which makes your resources reject Condor jobs when they are under 
> >>load or when there's some  keyboard activity. ( the output you sent was
> >>produced on pragma001, so you were working on it, and two others 
> >>have a load average of 1.000 ) . To TEST that MPI really works you 
> >>might want to disable this, by putting START=TRUE ( which would 
> >>allow any job to be invoked, regardless of the current computer 
> >>activity), or START=($(START))||((Scheduler =?= $(DedicatedScheduler)
> >>). Mark
> >>
> > 
> > 
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users