[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Fwd: Problem using Condor in GT4..




---------- Forwarded message ----------
From: Pushparajan V <vprajan@xxxxxxxxx>
Date: Jul 2, 2005 5:43 PM
Subject: Problem using Condor in GT4..
To: discuss@xxxxxxxxxx


Hi,

I have installed condor and globus successfully and condor pool has four sun nodes. I followed the steps to install scheduler adapter for condor as in documentation of globus.. now when i tried these commands i face the following problems...

$  globusrun -f condor.rsl
The job gets submitted successfully but not getting terminated atall..

if i use $ globus-job-run localhost/jobmanager-condor /bin/date
it just hangs on...

so i usually abort the job in each case.
---------------------------------------------------------------------------------------
The RSL file is:
+
( &(resourceManagerContact="garl-sun1.serc.iisc.ernet.in/jobmanager-condor ")
   (count=1)
   (label="subjob 0")
   (environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
                (LD_LIBRARY_PATH /usr/local1/gt4.0.0/lib/))
   (directory="/bin")
   (executable="/bin/date")
   (stdout="/home/rajan/mpitest/condor.out")
   (stderr="/home/rajan/mpitest/condor.err")
)
--------------------------------------------------------------------------
here, garl-sun1.serc.iisc.ernet.in is the head node of the cluster running condor-collector.
What is happening?
i have checked globus-condor.log, condor.pm script, and jobmanager-condor files(it is all untouched). The log file created by condor contains the following:
-----------------------------------------------------------
<c>
    <a n="MyType"><s>SubmitEvent</s></a>
    <a n="EventTypeNumber"><i>0</i></a>
    <a n="EventTime"><s>2005-07-02T14:33:19</s></a>
    <a n="Cluster"><i>41</i></a>
    <a n="Proc"><i>0</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="SubmitHost"><s>&lt;10.16.21.12:34003&gt;</s></a>
</c>
<c>
    <a n="MyType"><s>JobAbortedEvent</s></a>
    <a n="EventTypeNumber"><i>9</i></a>
    <a n="EventTime"><s>2005-07-02T14:57:23</s></a>
    <a n="Cluster"><i>41</i></a>
    <a n="Proc"><i>0</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="Reason"><s>via condor_rm (by user rajan)</s></a>
</c>
--------------------------------------------------------------------
  this seems like no useful information from condor log.. the GRAM log also says that i have aborted the job execution.. globusrun is going on without termination what to do ?

Is the condor scheduler of GT4 compatible with preWS RSL file?

Thanks

Rajan VP