[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Required ordinary user permissions?



Hello,

I have a Condor 6.7.6 master/submit server, and an 'ordinary user'
account on that server (i.e. the account is nothing special to condor).
The server is running Fedora Core 3 linux. Logging into the account I
can set the CONDOR_CONFIG environment variable, and can execute various
condor commands with no problem (condor_q, condor_status etc).

However, if I create and submit a simple Condor job (called 'hello'),
the job is accepted but does not fully run. It enters the Condor queue
and runs then stops, then runs and stops and so on. The hello.log,
hello.out and hello.err files are created in the users home directory.
The 'out and 'err' files are empty. The hello.log file contains:

   001 (157.000.000) 04/07 17:14:58 Job executing on host:
   <141.163.60.56:32770>
   ...
   007 (157.000.000) 04/07 17:14:58 Shadow exception!
           Unable to talk to job: disconnected

           48  -  Run Bytes Sent By Job
           174  -  Run Bytes Received By Job

The servers ShadowLog shows:

===================================================================
4/7 17:14:46 (?.?) (9878):******* Standard Shadow starting up *******
4/7 17:14:46 (?.?) (9878):** $CondorVersion: 6.7.6 Mar 15 2005 $
4/7 17:14:46 (?.?) (9878):** $CondorPlatform: I386-LINUX_RH9 $
4/7 17:14:46 (?.?) (9878):*******************************************
4/7 17:14:46 (?.?) (9878):uid=0, euid=1985, gid=0, egid=1985
4/7 17:14:46 (?.?) (9878):Hostname = "<141.163.60.56:32770>", Job =
157.0
4/7 17:14:46 (157.0) (9878):Requesting Primary Starter
4/7 17:14:46 (157.0) (9878):Shadow: Request to run a job was ACCEPTED
4/7 17:14:46 (157.0) (9878):Shadow: RSC_SOCK connected, fd = 17
4/7 17:14:46 (157.0) (9878):Shadow: CLIENT_LOG connected, fd = 18
4/7 17:14:46 (157.0) (9878):My_Filesystem_Domain =
"ltsp.csd.plymouth.ac.uk"
4/7 17:14:46 (157.0) (9878):My_UID_Domain = "ltsp.csd.plymouth.ac.uk"
4/7 17:14:58 (157.0) (9878):ERROR "Unable to talk to job: disconnected
" at line 134 in file receivers.C
4/7 17:14:58 (157.0) (9878):Shadow: DoCleanup: unlinking TmpCkpt
'/opt/condor/hosts/ltsp/spool/cluster157.proc0.subproc0.tmp'
4/7 17:14:58 (157.0) (9878):Trying to
unlink /opt/condor/hosts/ltsp/spool/cluster157.proc0.subproc0.tmp
===================================================================

So my question is, what is happening? I am taking a guess that 'unable
to talk to job' means that the server itself is having some sort of
permission trouble with the user account? The excute client (there is
only one) logs indicate nothing unusual.


Any thoughts/suggestions?


Regards,

John.

-- 
---------------------------------------------------------------
John Horne, University of Plymouth, UK  Tel: +44 (0)1752 233914
E-mail: John.Horne@xxxxxxxxxxxxxx       Fax: +44 (0)1752 233839