[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] best way to use cached data



On 12/11/2012 03:23 PM, Ian Chesal wrote:
> 
> 
> On Tuesday, 11 December, 2012 at 12:25 PM, Dimitri Maziuk wrote:
> 
>> It seems we're using slightly different definitions of "simple". Mine is
>>
>> 'Requirements = `[ -f /tmp/cache/subsetXYZ123 ]` == 0'

> Indeed. My simple is based in reality and actually works. :)


... And you probably wouldn't want to have to eval [
-f /some/file] for all the machines in your pool just to find a match.
It would be very, very slow for anything but a small pool.

Well, until you find one, so "on all machines" is the upper bound.

$ time (curl -I http://www/some/where/some/file >/dev/null 2>&1)

real    0m0.006s

Shouldn't take much longer than 6 seconds per 1K nodes.

...
> This would allow anyone on the local machine to administer Condor.
> If you wanted to limit it to a sub-set of users you can make that work too.

Look, *I am not on a local machine* at CHTC. I can flock 10,000 BLAST
jobs to CHTC, but it comes with 28GB of a search database. I can
transfer that pre-exec, but
 - I need to make sure I'm doing upload/machine, not upload/core. The
"simple and actually works" way to do it is with whole-machine slots,
last I looked, except the job I'm trying to run is not whole-machine.
(And that's not even accounting for "upload/switch backplane or you
saturate the whole subnet" and all that.)
 - How does a given CHTC node know to keep the data until all 10K jobs
are done and who gets to clean up afterwards.
 - Did I mention I (probably) don't have a login at CHTC?

That why I started with "I'd love to see a node-level data
placement mechanism in condor, or at least the ability to evaluate..."
-- because what it has is only simple on your own cluster of a
relatively few nodes. In which case here's an even simpler simple that
works for real here:

for i in `condor_status -f '%s\n' machine | sort | uniq` ; do rsync
mydatabasedir rsync://$i/databsedir ; done && condor_submit_dag mydag ||
mail -s 'update failed' root


-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Attachment: signature.asc
Description: OpenPGP digital signature