[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_ssh_to_job



On 8/21/2013 2:20 PM, Shrum, Donald C wrote:
Thanks for the replies....

Our condor cluster is a dedicated cluster and I am the sysadmin and perfectly happy for users to ssh_to_job to see what is going on.
User home directories are not on the processing nodes but the logins are enabled and shells present.

[dcshrum@condor-login vanilla]$ condor_q
-- Submitter: condor-login.local : <10.178.6.3:43563> : condor-login.local
  ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
    2.0   dcshrum         8/21 15:12   0+00:00:01 R  0   0.0  a.out

  [dcshrum@condor-login vanilla]$ condor_ssh_to_job 2
This account is currently not available.
Connection to condor-job.condor-29.local closed.


[dcshrum@condor-login vanilla]$ ssh dcshrum@condor-29
Warning: Permanently added 'condor-29,10.178.6.29' (RSA) to the list of known hosts.
dcshrum@condor-29's password:
-sh-4.1$ logout
Connection to condor-29 closed.

There must be a setting I am missing.

See my previous post...
So did you configure HTCondor to run jobs as user nobody on the execute machines? If so, ssh to an execute machine (condor-29) and do
   grep ^nobody /etc/passwd
Does this account exist on your execute machine? If not, there is your problem. If it does exist, what shell (last field) is specified? Is it something like /bin/nologin? If so that is your problem - it needs to be a shell listed in /etc/shells.

p.s. Note instead of user "nobody" you may want to define "slot users" so jobs running on different slots use different UIDs - see
http://research.cs.wisc.edu/htcondor/manual/v8.0/3_6Security.html#sec:RunAsNobody

-Todd