[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] [gt-user] Wanna Help



Hi Dears,
Yes, grid user can submit condor jobs directly using condor_submit.
Also grid user can submit jobs that their RM is jobmanager-fork. I
started condor_master as root manually on all my machines But I don't
know Why when I use "ps auwx" command to see my condor processes user
it shows condor user while they are started by root user.

Can you please give me a sample topology(for submitting job from a
Globus Client to Condor Pool via Globus Middleware) that help me to
find my problem?

bash-3.1# ps auwx  |grep condor
condor    1983  0.0  0.2   6876  2476 ?        Ss   Apr17   0:25
/usr/local/condor/sbin/condor_master
condor    1984  0.0  0.3   8212  3936 ?        Ss   Apr17   1:13
condor_startd -f
condor    1985  0.0  0.3   8304  3544 ?        Ss   Apr17   0:00
condor_schedd -f
root      4964  0.0  0.0   3856   520 pts/0    R+   01:05   0:00 grep condor

On 4/19/07, Martin Feller <feller@xxxxxxxxxxx> wrote:
Andrew,
i agree, this is no longer a globus issue. I hope this list is
more appropriate for Mehdi's problem. I don't know him at
all and don't work with him, his question just popped up in
the gt-user list; so it's probably the best if he answers
on your questions.
Martin

> Martin,
>
> To me, this looks like a condor problem rather then a globus problem.
> Can your grid user submit condor jobs directly using condor_submit? A
> complete stab in the dark, but you did start up the condor_master
> process as root, didn't you?
>
> Cheers,
>
> Andrew
>
>
>
> On 19 Apr 2007, at 16:35, Martin Feller wrote:
>
>> Mehdi,
>> it seems like what can be seen in the SchedLog is the problem.
>> seems like some permissions on files should be changed from user
>> grid to condor, which fails. sorry, this exceeds my condor knowledge.
>> good luck!
>> Martin
>>
>>> Dear Martin,
>>> Excuse me Martin, Yes, the local user-account  gets mapped to DN in
>>> the grid-mapfile
>>> on the condor compute nodes.
>>> The following are condor logs.
>>> --------------------------------------------------------------
>>> /home/condor/hosts/Server/log/SchedLog:4/19 16:11:28 (pid:3046)
>>> (154.135) Failed to chown
>>> /home/condor/hosts/Server/spool/cluster154.proc135.subproc0 from 504
>>> to 506.507.  User may run into permissions problems when fetching
>>> sandbox.
>>> -------------------------------------------------------------------------------------------
>>> /home/condor/hosts/localhost002/log/StarterLog:4/19 23:00:40 Failed to
>>> open
>>> '/home/grid/.globus/job/server.eng4.shirazu.ac.ir/30671.1176986459/stdout011'
>>> as standard output: Permission denied (errno 13)
>>> /home/condor/hosts/localhost002/log/StarterLog:4/19 23:00:40 Failed to
>>> open
>>> '/home/grid/.globus/job/server.eng4.shirazu.ac.ir/30671.1176986459/stderr011'
>>> as standard error: Permission denied (errno 13)
>>> The following are /etc/passwd on Server and Condor compute nodes.
>>> grid:x:504:504::/home/grid:/bin/bash
>>> gwadmin:x:505:506::/home/gwadmin:/bin/bash
>>> condor:x:506:507::/home/condor:/bin/bash
>>> On 4/19/07, Martin Feller <feller@xxxxxxxxxxx
>>> <mailto:feller@xxxxxxxxxxx>> wrote:
>>>> Mehdi,
>>>> Do the Condor-logs provide more information about that?
>>>> Is the local user-account, to which your DN in the grid-mapfile
>>>> gets mapped, available on the condor compute nodes?
>>>> Martin
>>>>
>>>> > Hi Martin,
>>>> > Yest It work with fork jobmanager.
>>>> >
>>>> > On 4/19/07, Martin Feller <feller@xxxxxxxxxxx
>>>> <mailto:feller@xxxxxxxxxxx>> wrote:
>>>> >> Does it work with fork?
>>>> >> Martin
>>>> >>
>>>> >> > Hi,
>>>> >> >  I want to submit a job to Condor pool via Globus GRAM. I
>>>> define the
>>>> >> > following RSL script. I submit my job by "globusrun  -f test2.rsl"
>>>> >> > from Server itself as a Client. My job goes to Held state. My RSL
>>>> >> > script file(test2.rsl) is:
>>>> >> >
>>>> ------------------------test2.rsl-----------------------------------
>>>> >> > +
>>>> >> > (
>>>> >>
>>>> &(resourceManagerContact="Server.eng4.shirazu.ac.ir/jobmanager-condor")
>>>> >> >   (count=1)
>>>> >> >   (label="subjob 0")
>>>> >> >   (environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
>>>> >> >                (LD_LIBRARY_PATH /usr/local/globus-4.0.3/lib/))
>>>> >> >   (directory="/home/grid/globusTest/GRAM/Test2")
>>>> >> >   (executable="/bin/ls")
>>>> >> >   (arguments  = "-R" "/tmp")
>>>> >> >   (stdout="lsoutput")
>>>> >> >   (stderr="lserr")
>>>> >> > )
>>>> >> >
>>>> -----------------------------------------------------------------------
>>>> >> >
>>>> >> > The output of globus-condor.log file is:
>>>> >> > --------------------------
>>>> >> globus-condor.log-----------------------------
>>>> >> > <c>
>>>> >> >    <a n="MyType"><s>SubmitEvent</s></a>
>>>> >> >    <a n="EventTypeNumber"><i>0</i></a>
>>>> >> >    <a n="EventTime"><s>2007-04-18T10:47:58</s></a>
>>>> >> >    <a n="Cluster"><i>126</i></a>
>>>> >> >    <a n="Proc"><i>0</i></a>
>>>> >> >    <a n="Subproc"><i>0</i></a>
>>>> >> >    <a n="SubmitHost"><s>&lt;192.168.1.254:47104&gt;</s></a>
>>>> >> > </c>
>>>> >> > <c>
>>>> >> >    <a n="MyType"><s>SubmitEvent</s></a>
>>>> >> >    <a n="EventTypeNumber"><i>0</i></a>
>>>> >> >    <a n="EventTime"><s>2007-04-18T10:47:58</s></a>
>>>> >> >    <a n="Cluster"><i>126</i></a>
>>>> >> >    <a n="Proc"><i>0</i></a>
>>>> >> >    <a n="Subproc"><i>0</i></a>
>>>> >> >    <a n="SubmitHost"><s>&lt;192.168.1.254:47104&gt;</s></a>
>>>> >> > </c>
>>>> >> > <c>
>>>> >> >    <a n="MyType"><s>ShadowExceptionEvent</s></a>
>>>> >> >    <a n="EventTypeNumber"><i>7</i></a>
>>>> >> >    <a n="EventTime"><s>2007-04-18T10:48:02</s></a>
>>>> >> >    <a n="Cluster"><i>126</i></a>
>>>> >> >    <a n="Proc"><i>0</i></a>
>>>> >> >    <a n="Subproc"><i>0</i></a>
>>>> >> >    <a n="Message"><s>Error from starter on localhost001: Failed to
>>>> >> > open
>>>> >> >
>>>> >>
>>>> '/home/grid/.globus/job/server.eng4.shirazu.ac.ir/15222.1176880678/stdout'
>>>> >>
>>>> >> > as standard output: Permission denied (errno 13)</s></a>
>>>> >> >    <a n="SentBytes"><r>0.000000000000000E+00</r></a>
>>>> >> >    <a n="ReceivedBytes"><r>0.000000000000000E+00</r></a>
>>>> >> > </c>
>>>> >> > <c>
>>>> >> >    <a n="MyType"><s>JobHeldEvent</s></a>
>>>> >> >    <a n="EventTypeNumber"><i>12</i></a>
>>>> >> >    <a n="EventTime"><s>2007-04-18T10:48:02</s></a>
>>>> >> >    <a n="Cluster"><i>126</i></a>
>>>> >> >    <a n="Proc"><i>0</i></a>
>>>> >> >    <a n="Subproc"><i>0</i></a>
>>>> >> >    <a n="HoldReason"><s>Error from starter on localhost001:
>>>> Failed to
>>>> >> > open
>>>> >> >
>>>> >>
>>>> '/home/grid/.globus/job/server.eng4.shirazu.ac.ir/15222.1176880678/stdout'
>>>> >>
>>>> >> > as standard output: Permission denied (errno 13)</s></a>
>>>> >> >    <a n="HoldReasonCode"><i>7</i></a>
>>>> >> >    <a n="HoldReasonSubCode"><i>7</i></a>
>>>> >> > </c>
>>>> >> > <c>
>>>> >> >    <a n="MyType"><s>ShadowExceptionEvent</s></a>
>>>> >> >    <a n="EventTypeNumber"><i>7</i></a>
>>>> >> >    <a n="EventTime"><s>2007-04-18T10:48:02</s></a>
>>>> >> >    <a n="Cluster"><i>126</i></a>
>>>> >> >    <a n="Proc"><i>0</i></a>
>>>> >> >    <a n="Subproc"><i>0</i></a>
>>>> >> >    <a n="Message"><s>Error from starter on localhost001: Failed to
>>>> >> > open
>>>> >> >
>>>> >>
>>>> '/home/grid/.globus/job/server.eng4.shirazu.ac.ir/15222.1176880678/stdout'
>>>> >>
>>>> >> > as standard output: Permission denied (errno 13)</s></a>
>>>> >> >    <a n="SentBytes"><r>0.000000000000000E+00</r></a>
>>>> >> >    <a n="ReceivedBytes"><r>0.000000000000000E+00</r></a>
>>>> >> > </c>
>>>> >> > <c>
>>>> >> >    <a n="MyType"><s>JobHeldEvent</s></a>
>>>> >> >    <a n="EventTypeNumber"><i>12</i></a>
>>>> >> >    <a n="EventTime"><s>2007-04-18T10:48:02</s></a>
>>>> >> >    <a n="Cluster"><i>126</i></a>
>>>> >> >    <a n="Proc"><i>0</i></a>
>>>> >> >    <a n="Subproc"><i>0</i></a>
>>>> >> >    <a n="HoldReason"><s>Error from starter on localhost001:
>>>> Failed to
>>>> >> > open
>>>> >> >
>>>> >>
>>>> '/home/grid/.globus/job/server.eng4.shirazu.ac.ir/15222.1176880678/stdout'
>>>> >>
>>>> >> > as standard output: Permission denied (errno 13)</s></a>
>>>> >> >    <a n="HoldReasonCode"><i>7</i></a>
>>>> >> >    <a n="HoldReasonSubCode"><i>7</i></a>
>>>> >> > </c>
>>>> >> > -------------------------------------------------------------------
>>>> >> >
>>>> >> > Can u please help me?
>>>> >>
>>>> >>
>>>> >
>>>> >
>>>>
>>>>
>>
>
> --
>
> Dr Andrew Walker
>
> Department of Earth Sciences
> University of Cambridge
> Downing Street
> Cambridge
> CB2 3EQ
> UK
>
> phone +44 (0)1223 333432
>
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR




--
Best Regards,
S.Mehdi Sheikhalishahi,
Web: http://www.cse.shirazu.ac.ir/~alishahi/
Bye.