[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] best way to use cached data



On 12/10/2012 06:32 PM, Ian Chesal wrote:
> On Monday, 10 December, 2012 at 12:59 PM, Dimitri Maziuk wrote:
>> On 12/09/2012 11:36 PM, John Wong wrote:
>>
>> That is a different story I think. I'd love to see a node-level data
>> placement mechanism in condor, or at least the ability to evaluate ` [
>> -f /var/tmp/mydatabase ] ` at job submission time, but I don't believe
>> you can.
>>
>>
> 
> Perhaps I'm misunderstanding what you're after here but why don't you have this now?
> 
> Job A runs on Machine A and bring along a subset of your massive
dataset in to some place like /tmp/cache. Before the job exits it leaves
a small bit of Condor configuration in the ~condor/config directory,
let's call it cache_contents.config, and the file simple says:
> 
> MyCacheContents = "subsetXYZ123"
> STARTD_ATTRS = $(STARTD_ATTRS), MyCacheContents
> 
> And it advertises the cache contents to the world by running:
> 
> condor_reconfig -full
> 
> before it finally exits.
> 
> Now the ClassAd for the machine contains the attribute:
> 
> MyCacheContents = "subsetXYZ123"
> 
> And jobs can steer based on this string by putting:
> 
> rank = MyCacheContents =!= Undefined * MyCacheContents == "subsetXYZ123" * 1000
> 
> In their submission files. If the machine has the subset of the data
cached already, the job will rank it higher than any other machine and
prefer to run their first.
> 
> Adjust to suit your tastes for preemption and what not.
> 
> Simple but effective.

It seems we're using slightly different definitions of "simple". Mine is

'Requirements = `[ -f /tmp/cache/subsetXYZ123 ]` == 0'

(Plus you glossed over the part where condor somehow knows that
"my_cache_contents.config" today and "his_cache_contents.config" tomorow
are all part condor_config. And snipped the bit where I said "does not
need privileges to" mess with system daemons -- but those are minor
details ;)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Attachment: signature.asc
Description: OpenPGP digital signature