[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] 'permission denied' brought some jobs to H status



On Jun 4, 2015, at 10:41 AM, qing <gang.qin@xxxxxxxxxxxxx> wrote:
> 
> Dear Zach:
> 
>  Thanks for the hints. This seems to be a common issue with ARC 5.0.0 and has already been tracked at https://ggus.eu/index.php?mode=ticket_info&ticket_id=113745, where Condor team indicates the the problem might be in the ARC job wrapper.
> 
>  Cheers,Gang
> 
> On 04/06/2015 16:29, Zachary Miller wrote:
>>> Hold reason: Error from slot1@node128: STARTER at 10.141.0.128 failed to send file(s) to <10.141.255.19:57731>; SHADOW at 10.141.255.19 failed to write to file /var/spool/arc/grid/dfgMDmjkVKmnbbfC3pqhhxZmABFKDmABFKDmZnFKDmABFKDm7g3Yon/_condor_stderr.aipanda063.cern.ch_15422080.0_1433368150: (errno 13) Permission denied
>> This "Permission denied" is coming from the filesystem on the submit machine.
>> 
>> Some things to consider:
>> 
>>   Is it local disk, or some kind of network mount?
>>   Was it clost to full? (such that the output files won't fit)
>>   Too many files in /var/spool/arc/grid/?
>>   Any other reason you can think of why you wouldn't be able to write a file.

Hi.
Iâve been working with Raul about this problem. My current suspicion is that this is caused by using a Condor-G version older than 8.0.5 to submit the jobs to the ARC server. This causing a collision on the filename _condor_stderr between Condor-G and the HTCondor cluster behind the ARC sever. 

Can you confirm if this is the case for the jobs that are failing for you?

Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project