[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Fwd: Problem using Condor in GT4..



On Jul 2, 2005, at 7:19 AM, Pushparajan V wrote:

---------- Forwarded message ----------
From: Pushparajan V <vprajan@xxxxxxxxx>
Date: Jul 2, 2005 5:43 PM
Subject: Problem using Condor in GT4..
To: discuss@xxxxxxxxxx


Hi,

I have installed condor and globus successfully and condor pool has four sun nodes. I followed the steps to install scheduler adapter for condor as in documentation of globus.. now when i tried these commands i face the following problems...

$  globusrun -f condor.rsl
The job gets submitted successfully but not getting terminated atall..

if i use $ globus-job-run localhost/jobmanager-condor /bin/date
it just hangs on...

so i usually abort the job in each case.
---------------------------------------------------------------------------------------
The RSL file is:
+
( &(resourceManagerContact="garl-sun1.serc.iisc.ernet.in/jobmanager-condor ")
   (count=1)
   (label="subjob 0")
   (environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
                (LD_LIBRARY_PATH /usr/local1/gt4.0.0/lib/))
   (directory="/bin")
   (executable="/bin/date")
   (stdout="/home/rajan/mpitest/condor.out")
   (stderr="/home/rajan/mpitest/condor.err")
)
--------------------------------------------------------------------------
here, garl-sun1.serc.iisc.ernet.in is the head node of the cluster running condor-collector.
What is happening?
i have checked globus-condor.log, condor.pm script, and jobmanager-condor files(it is all untouched). The log file created by condor contains the following:
-----------------------------------------------------------
<c>
    <a n="MyType"><s>SubmitEvent</s></a>
    <a n="EventTypeNumber"><i>0</i></a>
    <a n="EventTime"><s>2005-07-02T14:33:19</s></a>
    <a n="Cluster"><i>41</i></a>
    <a n="Proc"><i>0</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="SubmitHost"><s>&lt;10.16.21.12:34003&gt;</s></a>
</c>
<c>
    <a n="MyType"><s>JobAbortedEvent</s></a>
    <a n="EventTypeNumber"><i>9</i></a>
    <a n="EventTime"><s>2005-07-02T14:57:23</s></a>
    <a n="Cluster"><i>41</i></a>
    <a n="Proc"><i>0</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="Reason"><s>via condor_rm (by user rajan)</s></a>
</c>
--------------------------------------------------------------------
  this seems like no useful information from condor log.. the GRAM log also says that i have aborted the job execution.. globusrun is going on without termination what to do ?

Is the condor scheduler of GT4 compatible with preWS RSL file?

Realize that when you submit a job through globus to any batch system, the job may take hours to start running (depending on how many other jobs are already submitted). What happens if you submit an equivalent job directly to condor?

+----------------------------------+---------------------------------+

|            Jaime Frey            |  Public Split on Whether        |

|        jfrey@xxxxxxxxxxx         |  Bush Is a Divider              |

|  http://www.cs.wisc.edu/~jfrey/  |         -- CNN Scrolling Banner |

+----------------------------------+---------------------------------+