[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] disk space on file server fills up and condordrops the complete output



Hi Ian,
Any chance you could point me towards some info about job flow tests? I
think I may be looking in the wrong place!

Thank you for your help,
Rob



-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: 19 September 2008 15:33
To: Condor-Users Mail List
Subject: Re: [Condor-users] disk space on file server fills up and
condordrops the complete output

> While the job is running the network drive fills up The job finishes 
> and condor tries to transfer the results back to this network 
> recourse.
> This fails due to lack of space, but during this attempted copy it 
> still deletes the original files from the condor server.
>
> Currently I take backups of /condor/execute fairly regularly 
> throughout the day. However if this problem occurs at the beginning of

> the weekend we can lose two days of running time.
>
> Has anyone seen this issue before? Do you know of a workaround or fix 
> for it?

Yup. See it all the time. As part of your job flow test the copy back
and if it fails with an out-of-space error put the job to sleep instead
of ending it. Wake up periodically, test again, repeat. You can even
have the job send email if it ends up in this state, stuck on a machine,
so the user can grab data from the remote machine's drive and opt to
just kill the job forcefully with condor_rm.

We monitor our free NAS space with Nagios and admins with pagers get
emails on NAS events (like less than 10% free space left) and can do
things like add temporary space to get us through a weekend.

- Ian

Confidentiality Notice.
This message may contain information that is confidential or otherwise
protected from disclosure. If you are not the intended recipient, you
are hereby notified that any use, disclosure, dissemination,
distribution,  or copying  of this message, or any attachments, is
strictly prohibited.  If you have received this message in error, please
advise the sender by reply e-mail, and delete the message and any
attachments.  Thank you.

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/


This message has been scanned for viruses by MailControl -
www.mailcontrol.com



Click
https://www.mailcontrol.com/sr/57gFnf5Wcl!TndxI!oX7UhH8qESqrU+6TOGyoa1d!
ZsOA8a5N32t8M6pZht1Q5Vlf0!rSlLCZ2NLwdQMGDtFKA== to report this email as
spam.