[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor 8.6.5



Thanks for the suggestion.  Yes, after checking (and updating) the version this morning,
[nmoore@pilgrim condor_sub]$ yum list installed | grep condor
condor.x86_64                          8.6.5-2.el7                 @htcondor-stable
condor-all.x86_64                      8.6.5-2.el7                 @htcondor-stable
condor-bosco.x86_64                    8.6.5-2.el7                 @htcondor-stable
condor-classads.x86_64                 8.6.5-2.el7                 @htcondor-stable
condor-cream-gahp.x86_64               8.6.5-2.el7                 @htcondor-stable
condor-external-libs.x86_64            8.6.5-2.el7                 @htcondor-stable
condor-externals.x86_64                8.6.5-2.el7                 @htcondor-stable
condor-kbdd.x86_64                     8.6.5-2.el7                 @htcondor-stable
condor-procd.x86_64                    8.6.5-2.el7                 @htcondor-stable
condor-python.x86_64                   8.6.5-2.el7                 @htcondor-stable
condor-std-universe.x86_64             8.6.5-2.el7                 @htcondor-stable
condor-vm-gahp.x86_64                  8.6.5-2.el7                 @htcondor-stable

The problem seems to persist, 
[nmoore@pilgrim ~]$ condor_q -hold


-- Schedd: pilgrim : <199.17.158.20:9618?... @ 08/07/17 16:53:54
 ID      OWNER          HELD_SINCE  HOLD_REASON
  11.0   nmoore          8/7  12:04 Error from slot1@pilgrim: Failed to open '/home/nmoore/condor_sub/job-3.out' as standard output: Permission denied (errno 13)
  12.0   nmoore          8/7  12:04 Error from slot2@pilgrim: Failed to open '/home/nmoore/condor_sub/job-2.out' as standard output: Permission denied (errno 13)
  13.0   nmoore          8/7  12:04 Error from slot2@pilgrim: Failed to open '/home/nmoore/condor_sub/job.out' as standard output: Permission denied (errno 13)

3 jobs; 0 completed, 0 removed, 0 idle, 0 running, 3 held, 0 suspended


Random thought - Does the condor user on each machine need to have the same uid?  Is the difference in /etc/passwd file below the cause of this error?  I manage shared logins with NIS (yes, ancient I know, but so easy for something behind a firewall...).  Do I need to make a nis/yp entry with no shell for the condor user?

[nmoore@pilgrim ~]$ cat /etc/passwd | grep condor
condor:x:988:983:Owner of HTCondor Daemons:/var/lib/condor:/sbin/nologin

[nmoore@pilgrim ~]$ ssh 199.17.158.2 cat /etc/passwd  | grep condor
condor:x:990:988:Owner of HTCondor Daemons:/var/lib/condor:/sbin/nologin

[nmoore@pilgrim ~]$ ssh 199.17.158.6 cat /etc/passwd  | grep condor
condor:x:990:988:Owner of HTCondor Daemons:/var/lib/condor:/sbin/nologin







From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Tim Theisen <tim@xxxxxxxxxxx>
Sent: Monday, August 7, 2017 8:55 AM
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] condor 8.6.5
 

Are you running on Red Hat 7.4 with SELinux enabled? If so, you should updated to HTCondor 8.6.5-2 and try again.


...Tim


On 08/04/2017 05:48 PM, Moore, Nathan T wrote:

AND, if I turn off condor on all nodes except the scheduler, and submit the job on the scheduler only (which also hosts home directories for NFS), the jobs still fail with the same file permissions error.


Again, I think I'm missing something fundamental.  Suggestions appreciated!


[nmoore@greylag condor-sub]$ condor_q


-- Schedd: greylag : <199.17.158.2:9618?... @ 08/04/17 17:42:49
OWNER  BATCH_NAME             SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS
nmoore CMD: estimate_pi.py   8/4  17:42      _      _      _      1      1 29.0
nmoore CMD: estimate_pi.py   8/4  17:42      _      _      _      1      1 30.0

2 jobs; 0 completed, 0 removed, 0 idle, 0 running, 2 held, 0 suspended
[nmoore@greylag condor-sub]$ condor_q -hold


-- Schedd: greylag : <199.17.158.2:9618?... @ 08/04/17 17:42:54
 ID      OWNER          HELD_SINCE  HOLD_REASON
  29.0   nmoore          8/4  17:42 Error from slot1@greylag: Failed to open '/home/nmoore/condor_sub/job.out' as standard output: Permission denied (errno 13)
  30.0   nmoore          8/4  17:42 Error from slot1@greylag: Failed to open '/data-shared/condor-sub/job.out' as standard output: Permission denied (errno 13)

2 jobs; 0 completed, 0 removed, 0 idle, 0 running, 2 held, 0 suspended

 



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Moore, Nathan T <nmoore@xxxxxxxxxx>
Sent: Friday, August 4, 2017 2:47 PM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] condor 8.6.5
 

This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing
Feedback

Home is an NFS filesystem share via autofs.  


I did the install via yum, so I'm not sure were FILESYSTEM_DOMAIN is set.



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Ben Cotton <ben.cotton@xxxxxxxxxxxxxxxxxx>
Sent: Friday, August 4, 2017 1:17:17 PM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] condor 8.6.5
 
Hi Nathan,

> File permissions error?
>
Looks like it. Do your machines all have the same FILESYSTEM_DOMAIN
setting? Is /home an NFS share?


--
Ben Cotton
Technical Marketing Manager

Cycle Computing
Better Answers. Faster.

http://www.cyclecomputing.com
twitter: @cyclecomputing
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

-- 
Tim Theisen
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736