[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] [gt-user] Wanna Help



Martin,

To me, this looks like a condor problem rather then a globus problem. Can your grid user submit condor jobs directly using condor_submit? A complete stab in the dark, but you did start up the condor_master process as root, didn't you?

Cheers,

Andrew



On 19 Apr 2007, at 16:35, Martin Feller wrote:

Mehdi,
it seems like what can be seen in the SchedLog is the problem.
seems like some permissions on files should be changed from user
grid to condor, which fails. sorry, this exceeds my condor knowledge.
good luck!
Martin

Dear Martin,
Excuse me Martin, Yes, the local user-account  gets mapped to DN in
the grid-mapfile
on the condor compute nodes.
The following are condor logs.
--------------------------------------------------------------
/home/condor/hosts/Server/log/SchedLog:4/19 16:11:28 (pid:3046)
(154.135) Failed to chown
/home/condor/hosts/Server/spool/cluster154.proc135.subproc0 from 504
to 506.507.  User may run into permissions problems when fetching
sandbox.
------------------------------------------------------------------------------------------- /home/condor/hosts/localhost002/log/StarterLog:4/19 23:00:40 Failed to
open '/home/grid/.globus/job/server.eng4.shirazu.ac.ir/30671.1176986459/stdout011' as standard output: Permission denied (errno 13)
/home/condor/hosts/localhost002/log/StarterLog:4/19 23:00:40 Failed to
open '/home/grid/.globus/job/server.eng4.shirazu.ac.ir/30671.1176986459/stderr011' as standard error: Permission denied (errno 13)
The following are /etc/passwd on Server and Condor compute nodes.
grid:x:504:504::/home/grid:/bin/bash
gwadmin:x:505:506::/home/gwadmin:/bin/bash
condor:x:506:507::/home/condor:/bin/bash
On 4/19/07, Martin Feller <feller@xxxxxxxxxxx> wrote:
Mehdi,
Do the Condor-logs provide more information about that?
Is the local user-account, to which your DN in the grid-mapfile
gets mapped, available on the condor compute nodes?
Martin

> Hi Martin,
> Yest It work with fork jobmanager.
>
> On 4/19/07, Martin Feller <feller@xxxxxxxxxxx> wrote:
>> Does it work with fork?
>> Martin
>>
>> > Hi,
>> >  I want to submit a job to Condor pool via Globus GRAM. I define the
>> > following RSL script. I submit my job by "globusrun  -f test2.rsl"
>> > from Server itself as a Client. My job goes to Held state. My RSL
>> > script file(test2.rsl) is:
>> > ------------------------test2.rsl-----------------------------------
>> > +
>> > (
>> &(resourceManagerContact="Server.eng4.shirazu.ac.ir/jobmanager-condor")
>> >   (count=1)
>> >   (label="subjob 0")
>> >   (environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
>> >                (LD_LIBRARY_PATH /usr/local/globus-4.0.3/lib/))
>> >   (directory="/home/grid/globusTest/GRAM/Test2")
>> >   (executable="/bin/ls")
>> >   (arguments  = "-R" "/tmp")
>> >   (stdout="lsoutput")
>> >   (stderr="lserr")
>> > )
>> > -----------------------------------------------------------------------
>> >
>> > The output of globus-condor.log file is:
>> > --------------------------
>> globus-condor.log-----------------------------
>> > <c>
>> >    <a n="MyType"><s>SubmitEvent</s></a>
>> >    <a n="EventTypeNumber"><i>0</i></a>
>> >    <a n="EventTime"><s>2007-04-18T10:47:58</s></a>
>> >    <a n="Cluster"><i>126</i></a>
>> >    <a n="Proc"><i>0</i></a>
>> >    <a n="Subproc"><i>0</i></a>
>> >    <a n="SubmitHost"><s>&lt;192.168.1.254:47104&gt;</s></a>
>> > </c>
>> > <c>
>> >    <a n="MyType"><s>SubmitEvent</s></a>
>> >    <a n="EventTypeNumber"><i>0</i></a>
>> >    <a n="EventTime"><s>2007-04-18T10:47:58</s></a>
>> >    <a n="Cluster"><i>126</i></a>
>> >    <a n="Proc"><i>0</i></a>
>> >    <a n="Subproc"><i>0</i></a>
>> >    <a n="SubmitHost"><s>&lt;192.168.1.254:47104&gt;</s></a>
>> > </c>
>> > <c>
>> >    <a n="MyType"><s>ShadowExceptionEvent</s></a>
>> >    <a n="EventTypeNumber"><i>7</i></a>
>> >    <a n="EventTime"><s>2007-04-18T10:48:02</s></a>
>> >    <a n="Cluster"><i>126</i></a>
>> >    <a n="Proc"><i>0</i></a>
>> >    <a n="Subproc"><i>0</i></a>
>> >    <a n="Message"><s>Error from starter on localhost001: Failed to
>> > open
>> >
>> '/home/grid/.globus/job/server.eng4.shirazu.ac.ir/15222.1176880678/stdout' 
>>
>> > as standard output: Permission denied (errno 13)</s></a>
>> >    <a n="SentBytes"><r>0.000000000000000E+00</r></a>
>> >    <a n="ReceivedBytes"><r>0.000000000000000E+00</r></a>
>> > </c>
>> > <c>
>> >    <a n="MyType"><s>JobHeldEvent</s></a>
>> >    <a n="EventTypeNumber"><i>12</i></a>
>> >    <a n="EventTime"><s>2007-04-18T10:48:02</s></a>
>> >    <a n="Cluster"><i>126</i></a>
>> >    <a n="Proc"><i>0</i></a>
>> >    <a n="Subproc"><i>0</i></a>
>> >    <a n="HoldReason"><s>Error from starter on localhost001: Failed to
>> > open
>> >
>> '/home/grid/.globus/job/server.eng4.shirazu.ac.ir/15222.1176880678/stdout' 
>>
>> > as standard output: Permission denied (errno 13)</s></a>
>> >    <a n="HoldReasonCode"><i>7</i></a>
>> >    <a n="HoldReasonSubCode"><i>7</i></a>
>> > </c>
>> > <c>
>> >    <a n="MyType"><s>ShadowExceptionEvent</s></a>
>> >    <a n="EventTypeNumber"><i>7</i></a>
>> >    <a n="EventTime"><s>2007-04-18T10:48:02</s></a>
>> >    <a n="Cluster"><i>126</i></a>
>> >    <a n="Proc"><i>0</i></a>
>> >    <a n="Subproc"><i>0</i></a>
>> >    <a n="Message"><s>Error from starter on localhost001: Failed to
>> > open
>> >
>> '/home/grid/.globus/job/server.eng4.shirazu.ac.ir/15222.1176880678/stdout' 
>>
>> > as standard output: Permission denied (errno 13)</s></a>
>> >    <a n="SentBytes"><r>0.000000000000000E+00</r></a>
>> >    <a n="ReceivedBytes"><r>0.000000000000000E+00</r></a>
>> > </c>
>> > <c>
>> >    <a n="MyType"><s>JobHeldEvent</s></a>
>> >    <a n="EventTypeNumber"><i>12</i></a>
>> >    <a n="EventTime"><s>2007-04-18T10:48:02</s></a>
>> >    <a n="Cluster"><i>126</i></a>
>> >    <a n="Proc"><i>0</i></a>
>> >    <a n="Subproc"><i>0</i></a>
>> >    <a n="HoldReason"><s>Error from starter on localhost001: Failed to
>> > open
>> >
>> '/home/grid/.globus/job/server.eng4.shirazu.ac.ir/15222.1176880678/stdout' 
>>
>> > as standard output: Permission denied (errno 13)</s></a>
>> >    <a n="HoldReasonCode"><i>7</i></a>
>> >    <a n="HoldReasonSubCode"><i>7</i></a>
>> > </c>
>> > -------------------------------------------------------------------
>> >
>> > Can u please help me?
>>
>>
>
>




--

Dr Andrew Walker

Department of Earth Sciences
University of Cambridge
Downing Street
Cambridge 
CB2 3EQ
UK

phone +44 (0)1223 333432