[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Cycle stealing (WAS: Attributes disappearing from input piped to the job fetch hook)



----Original Message-----
> From: Ian Chesal
> Sent: 14 December 2009 22:28
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] Cycle stealing (WAS: Attributes disappearing from input piped to the job fetch hook)
> 
> Well, you know I do. :) At least up until this year I did. My new system, based around Hooks, is not working on our desktops yet. That's an early 
> 2010 project for me. And it likely won't use Condor to do the on/off decisions. I'll probably just have my hook script sleep if the time is  
> during the day, not on the weekend, kind of thing. Way easier than managing that crazy START expression. :)

I hope you do a talk at a condor event on the job hooks - I might even try to come!

> Hassles? Aside from a really complicated Condor config file what'd you run in to? 

Weird permissions issues, jobs hanging indefinitely or the machine becoming a 'black hole' and swallowing loads of jobs uselessly (the latter is the biggest problem)

> Our desktops are all pretty uniform in our Engineering department here. 

We had total control over ours too, yet still problems. Plus avoiding sending jobs to machines which we being used as submit machines and the like. In the end people didn't use it even when it was set to only run jobs from themselves on their own machines (as a way of getting them *some* throughput if the farm was busy)

> We treat the desktops in the pool with a pretty laissez-faire attitude. If the desktop machines are running jobs okay: great. 
> If not: take them out of the pool, don't even try to figure out what went wrong. Low effort. And it works pretty well.

We pretty much did that - we ended up with no machines left!

To be honest though the major reason not to do it these days is that the desktop machines are (inherently) not in the same (or fibre connected to) data centre as the databases, network file systems, collector infrastructure etc. This means throughput on those machines becomes painfully slow.

Most of my work now is on trying to either reduce the IO or reduce the latency aspect of it via parallelization.

I'm far more excited by Fusion IO cards than I am a new CPU these days.

Matt

----
Gloucester Research Limited believes the information provided herein is reliable. While every care has been taken to ensure accuracy, the information is furnished to the recipients with no warranty as to the completeness and accuracy of its contents and on condition that any errors or omissions shall not be made the basis for any claim, demand or cause for action.
The information in this email is intended only for the named recipient.  If you are not the intended recipient please notify us immediately and do not copy, distribute or take action based on this e-mail.
All messages sent to and from this email address will be logged by Gloucester Research Ltd and are subject to archival storage, monitoring, review and disclosure.
Gloucester Research Limited, 5th Floor, Whittington House, 19-30 Alfred Place, London WC1E 7EA.
Gloucester Research Limited is a company registered in England and Wales with company number 04267560.
----