[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] 'permission denied' brought some jobs to H status
- Date: Thu, 4 Jun 2015 16:28:11 -0500
- From: Jaime Frey <jfrey@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] 'permission denied' brought some jobs to H status
On Jun 4, 2015, at 10:41 AM, qing <gang.qin@xxxxxxxxxxxxx> wrote:
> Dear Zach:
> Thanks for the hints. This seems to be a common issue with ARC 5.0.0 and has already been tracked at https://ggus.eu/index.php?mode=ticket_info&ticket_id=113745, where Condor team indicates the the problem might be in the ARC job wrapper.
> On 04/06/2015 16:29, Zachary Miller wrote:
>>> Hold reason: Error from slot1@node128: STARTER at 10.141.0.128 failed to send file(s) to <10.141.255.19:57731>; SHADOW at 10.141.255.19 failed to write to file /var/spool/arc/grid/dfgMDmjkVKmnbbfC3pqhhxZmABFKDmABFKDmZnFKDmABFKDm7g3Yon/_condor_stderr.aipanda063.cern.ch_15422080.0_1433368150: (errno 13) Permission denied
>> This "Permission denied" is coming from the filesystem on the submit machine.
>> Some things to consider:
>> Is it local disk, or some kind of network mount?
>> Was it clost to full? (such that the output files won't fit)
>> Too many files in /var/spool/arc/grid/?
>> Any other reason you can think of why you wouldn't be able to write a file.
Iâve been working with Raul about this problem. My current suspicion is that this is caused by using a Condor-G version older than 8.0.5 to submit the jobs to the ARC server. This causing a collision on the filename _condor_stderr between Condor-G and the HTCondor cluster behind the ARC sever.
Can you confirm if this is the case for the jobs that are failing for you?
Thanks and regards,
UW-Madison HTCondor Project