Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] best way to use cached data

Date: Mon, 10 Dec 2012 19:32:04 -0500
From: Ian Chesal <ichesal@xxxxxxxxxxxxxxxxxx>
Subject: Re: [HTCondor-users] best way to use cached data

On Monday, 10 December, 2012 at 12:59 PM, Dimitri Maziuk wrote:

On 12/09/2012 11:36 PM, John Wong wrote:

That is a different story I think. I'd love to see a node-level data
placement mechanism in condor, or at least the ability to evaluate ` [
-f /var/tmp/mydatabase ] ` at job submission time, but I don't believe
you can.

Perhaps I'm misunderstanding what you're after here but why don't you have this now?

Job A runs on Machine A and bring along a subset of your massive dataset in to some place like /tmp/cache. Before the job exits it leaves a small bit of Condor configuration in the ~condor/config directory, let's call it cache_contents.config, and the file simple says:

MyCacheContents = "subsetXYZ123"

STARTD_ATTRS = $(STARTD_ATTRS), MyCacheContents

And it advertises the cache contents to the world by running:

condor_reconfig -full

before it finally exits.

Now the ClassAd for the machine contains the attribute:

MyCacheContents = "subsetXYZ123"

And jobs can steer based on this string by putting:

rank = MyCacheContents =!= Undefined * MyCacheContents == "subsetXYZ123" * 1000

In their submission files. If the machine has the subset of the data cached already, the job will rank it higher than any other machine and prefer to run their first.

Adjust to suit your tastes for preemption and what not.

Simple but effective. You can make the identifying string for the cache contents encode some additional information if you want to do some sort of fuzzier logic around steering jobs to machines rather than just simple string matching.

Regards,

- Ian

Ian Chesal

Cycle Computing, LLC

Leader in Open Compute Solutions for Clouds, Servers, and Desktops

Enterprise Condor Support and Management Tools

888.292.5320

http://www.cyclecomputing.com

http://www.cyclecloud.com

http://twitter.com/cyclecomputing

Follow-Ups:
- Re: [HTCondor-users] best way to use cached data
  - From: Dimitri Maziuk

References:
- [HTCondor-users] best way to use cached data
  - From: John Wong
- Re: [HTCondor-users] best way to use cached data
  - From: Dimitri Maziuk

Prev by Date: [HTCondor-users] dont_encrypt_input_files and dont_encrypt_output_files
Next by Date: [HTCondor-users] Condor 7.6.8 compilation error in Ubuntu 10.04 LTS
Previous by thread: Re: [HTCondor-users] best way to use cached data
Next by thread: Re: [HTCondor-users] best way to use cached data
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] best way to use cached data