[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] [gt-user] Wanna Help



Hi Dears,
Yes, grid user can submit condor jobs directly using condor_submit.
Also grid user can submit jobs that their RM is jobmanager-fork. I
started condor_master as root manually on all my machines But I don't
know Why when I use "ps auwx" command to see my condor processes user
it shows condor user while they are started by root user.

bash-3.1# ps auwx  |grep condor
condor    1983  0.0  0.2   6876  2476 ?        Ss   Apr17   0:25
/usr/local/condor/sbin/condor_master
condor    1984  0.0  0.3   8212  3936 ?        Ss   Apr17   1:13
condor_startd -f
condor    1985  0.0  0.3   8304  3544 ?        Ss   Apr17   0:00
condor_schedd -f
root      4964  0.0  0.0   3856   520 pts/0    R+   01:05   0:00 grep condor


On 4/19/07, Andrew Walker <amw75@xxxxxxxxx> wrote:
Martin,

To me, this looks like a condor problem rather then a globus problem. Can
your grid user submit condor jobs directly using condor_submit? A complete
stab in the dark, but you did start up the condor_master process as root,
didn't you?

Cheers,

Andrew




On 19 Apr 2007, at 16:35, Martin Feller wrote:

Mehdi,
it seems like what can be seen in the SchedLog is the problem.
seems like some permissions on files should be changed from user
grid to condor, which fails. sorry, this exceeds my condor knowledge.
good luck!
Martin


Dear Martin,
Excuse me Martin, Yes, the local user-account  gets mapped to DN in
the grid-mapfile
on the condor compute nodes.
The following are condor logs.
--------------------------------------------------------------
/home/condor/hosts/Server/log/SchedLog:4/19 16:11:28
(pid:3046)
(154.135) Failed to chown
/home/condor/hosts/Server/spool/cluster154.proc135.subproc0
from 504
to 506.507.  User may run into permissions problems when fetching
sandbox.
-------------------------------------------------------------------------------------------
/home/condor/hosts/localhost002/log/StarterLog:4/19
23:00:40 Failed to
open
'/home/grid/.globus/job/server.eng4.shirazu.ac.ir/30671.1176986459/stdout011'
as standard output: Permission denied (errno 13)
/home/condor/hosts/localhost002/log/StarterLog:4/19
23:00:40 Failed to
open
'/home/grid/.globus/job/server.eng4.shirazu.ac.ir/30671.1176986459/stderr011'
as standard error: Permission denied (errno 13)
The following are /etc/passwd on Server and Condor compute nodes.
grid:x:504:504::/home/grid:/bin/bash
gwadmin:x:505:506::/home/gwadmin:/bin/bash
condor:x:506:507::/home/condor:/bin/bash
On 4/19/07, Martin Feller <feller@xxxxxxxxxxx> wrote:
Mehdi,
Do the Condor-logs provide more information about that?
Is the local user-account, to which your DN in the grid-mapfile
gets mapped, available on the condor compute nodes?
Martin

> Hi Martin,
> Yest It work with fork jobmanager.
>
> On 4/19/07, Martin Feller <feller@xxxxxxxxxxx> wrote:
>> Does it work with fork?
>> Martin
>>
>> > Hi,
>> >  I want to submit a job to Condor pool via Globus GRAM. I define the
>> > following RSL script. I submit my job by "globusrun  -f test2.rsl"
>> > from Server itself as a Client. My job goes to Held state. My RSL
>> > script file(test2.rsl) is:
>> >
------------------------test2.rsl-----------------------------------
>> > +
>> > (
>>
&(resourceManagerContact="Server.eng4.shirazu.ac.ir/jobmanager-condor")
>> >   (count=1)
>> >   (label="subjob 0")
>> >   (environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
>> >                (LD_LIBRARY_PATH /usr/local/globus-4.0.3/lib/))
>> >   (directory="/home/grid/globusTest/GRAM/Test2")
>> >   (executable="/bin/ls")
>> >   (arguments  = "-R" "/tmp")
>> >   (stdout="lsoutput")
>> >   (stderr="lserr")
>> > )
>> >
-----------------------------------------------------------------------
>> >
>> > The output of globus-condor.log file is:
>> > --------------------------
>> globus-condor.log-----------------------------
>> > <c>
>> >    <a n="MyType"><s>SubmitEvent</s></a>
>> >    <a n="EventTypeNumber"><i>0</i></a>
>> >    <a n="EventTime"><s>2007-04-18T10:47:58</s></a>
>> >    <a n="Cluster"><i>126</i></a>
>> >    <a n="Proc"><i>0</i></a>
>> >    <a n="Subproc"><i>0</i></a>
>> >    <a
n="SubmitHost"><s>&lt;192.168.1.254:47104&gt;</s></a>
>> > </c>
>> > <c>
>> >    <a n="MyType"><s>SubmitEvent</s></a>
>> >    <a n="EventTypeNumber"><i>0</i></a>
>> >    <a n="EventTime"><s>2007-04-18T10:47:58</s></a>
>> >    <a n="Cluster"><i>126</i></a>
>> >    <a n="Proc"><i>0</i></a>
>> >    <a n="Subproc"><i>0</i></a>
>> >    <a
n="SubmitHost"><s>&lt;192.168.1.254:47104&gt;</s></a>
>> > </c>
>> > <c>
>> >    <a n="MyType"><s>ShadowExceptionEvent</s></a>
>> >    <a n="EventTypeNumber"><i>7</i></a>
>> >    <a n="EventTime"><s>2007-04-18T10:48:02</s></a>
>> >    <a n="Cluster"><i>126</i></a>
>> >    <a n="Proc"><i>0</i></a>
>> >    <a n="Subproc"><i>0</i></a>
>> >    <a n="Message"><s>Error from starter on localhost001: Failed to
>> > open
>> >
>>
'/home/grid/.globus/job/server.eng4.shirazu.ac.ir/15222.1176880678/stdout'
>>
>> > as standard output: Permission denied (errno 13)</s></a>
>> >    <a n="SentBytes"><r>0.000000000000000E+00</r></a>
>> >    <a n="ReceivedBytes"><r>0.000000000000000E+00</r></a>
>> > </c>
>> > <c>
>> >    <a n="MyType"><s>JobHeldEvent</s></a>
>> >    <a n="EventTypeNumber"><i>12</i></a>
>> >    <a n="EventTime"><s>2007-04-18T10:48:02</s></a>
>> >    <a n="Cluster"><i>126</i></a>
>> >    <a n="Proc"><i>0</i></a>
>> >    <a n="Subproc"><i>0</i></a>
>> >    <a n="HoldReason"><s>Error from starter on localhost001: Failed to
>> > open
>> >
>>
'/home/grid/.globus/job/server.eng4.shirazu.ac.ir/15222.1176880678/stdout'
>>
>> > as standard output: Permission denied (errno 13)</s></a>
>> >    <a n="HoldReasonCode"><i>7</i></a>
>> >    <a n="HoldReasonSubCode"><i>7</i></a>
>> > </c>
>> > <c>
>> >    <a n="MyType"><s>ShadowExceptionEvent</s></a>
>> >    <a n="EventTypeNumber"><i>7</i></a>
>> >    <a n="EventTime"><s>2007-04-18T10:48:02</s></a>
>> >    <a n="Cluster"><i>126</i></a>
>> >    <a n="Proc"><i>0</i></a>
>> >    <a n="Subproc"><i>0</i></a>
>> >    <a n="Message"><s>Error from starter on localhost001: Failed to
>> > open
>> >
>>
'/home/grid/.globus/job/server.eng4.shirazu.ac.ir/15222.1176880678/stdout'
>>
>> > as standard output: Permission denied (errno 13)</s></a>
>> >    <a n="SentBytes"><r>0.000000000000000E+00</r></a>
>> >    <a n="ReceivedBytes"><r>0.000000000000000E+00</r></a>
>> > </c>
>> > <c>
>> >    <a n="MyType"><s>JobHeldEvent</s></a>
>> >    <a n="EventTypeNumber"><i>12</i></a>
>> >    <a n="EventTime"><s>2007-04-18T10:48:02</s></a>
>> >    <a n="Cluster"><i>126</i></a>
>> >    <a n="Proc"><i>0</i></a>
>> >    <a n="Subproc"><i>0</i></a>
>> >    <a n="HoldReason"><s>Error from starter on localhost001: Failed to
>> > open
>> >
>>
'/home/grid/.globus/job/server.eng4.shirazu.ac.ir/15222.1176880678/stdout'
>>
>> > as standard output: Permission denied (errno 13)</s></a>
>> >    <a n="HoldReasonCode"><i>7</i></a>
>> >    <a n="HoldReasonSubCode"><i>7</i></a>
>> > </c>
>> >
-------------------------------------------------------------------
>> >
>> > Can u please help me?
>>
>>
>
>





--


Dr Andrew Walker

Department of Earth Sciences
University of Cambridge
Downing Street
Cambridge
CB2 3EQ
UK

phone +44 (0)1223 333432





_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR



--
Best Regards,
S.Mehdi Sheikhalishahi,
Web: http://www.cse.shirazu.ac.ir/~alishahi/
Bye.