Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Reserving RAM dynamically

Date: Mon, 16 Nov 2015 12:28:42 +0000
From: Brian Candler <b.candler@xxxxxxxxx>
Subject: [HTCondor-users] Reserving RAM dynamically

Outline question: is there a way to dynamically reserve some resource(in particular RAM) without actually running a job?

Here's the actual scenario. We are using partionable slots. We have abunch of jobs which run, where all of them open a very large mmap()read-only database. This uses X GB of memory, but due to sharing, onlyone chunk of X GB is used regardless of how many jobs have that databaseopen at once.


I'm trying to work out how best to handle this in the condor world.

If we add X to the request_memory of each job, then we run only afraction of the number of jobs which ought to be able to run concurrently.

If we don't take account of X at all, and each job just declares its"other" memory usage, then we end up with out-of-memory conditions whenmany instances are running.

If we guess that (say) 20 instances of a job will run at once, then wecould add X/20 to the declared memory usage of one job. However thisdoesn't actually work: given a mixture of different jobs, there mightactually be only 1 instance of this job plus 19 instances of other jobswhich don't use the database. Again, we run out of memory.

The current approach we are using is to statically subtract X from thetotal amount of memory available on the server:

MEMORY=($(DETECTED_MEMORY)-xxxxx)

However this is less than ideal because:

1. When we are running jobs which don't use the database, fewer jobs areable to run than otherwise could2. It won't work in future when we start using multiple databasesconcurrently, e.g. some jobs open shared database X and some open shareddatabase Y.3. The size of databases X and Y will grow over time, and we'd rathernot reconfigure and restart condor.


So I am wondering if it is possible to do something like:

- before we run any jobs which open X, we would reduce the availablememory on that node by X

- and once we no longer have any jobs which use X, we release it

One way to do that would be to run a dummy job which declares that ituses X amount of memory but does nothing. We could also announce amachine classAd attribute saying that database X is available. Thiswould have to be an infinite-running job, and we would have to kill itwhen we want to remove the memory reservation for X. This sounds messy.

I wonder if there is a more direct way to make a claim on a resource,without actually having to run a job?

I suppose our ideal solution would be to have direct support for sharedRAM resources. For example, a job might declare:


request_memory = 1000
request_shared = FOO:8000

If there is no other job which is currently using FOO, then theavailable memory pool would be reduced by 8000 before this job starts(perhaps by creating a dummy partitioned slot taking 8000); this clearlyhas an impact on matchmaking. And when all jobs using FOO terminate, thedummy partitioned slot would be dropped.


Any clues for how to deal with this?

Thanks,

Brian Candler.

Follow-Ups:
- Re: [HTCondor-users] Reserving RAM dynamically
  - From: Rich Pieri

Prev by Date: Re: [HTCondor-users] Ubuntu packages for 8.5?
Next by Date: [HTCondor-users] When configuring a new htcondor cluster, is it mandatory to change the server's hostname?
Previous by thread: Re: [HTCondor-users] Ubuntu packages for 8.5?
Next by thread: Re: [HTCondor-users] Reserving RAM dynamically
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[HTCondor-users] Reserving RAM dynamically