[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Unused memory in classAd?



There was a message on this subject in 2004, but I'd like to know if
anything has changed since then. It seems like an important issue. 

 

Is there any way to determine the amount of unused memory on a machine,
and include it in the machine's ClassAd, so I can refer to it in a job's
requirements? (Is this something Hawkeye can do?) I have a job that
crashes under Windows XP if it starts on a machine with too little
memory available. This makes the job sensitive to other jobs running on
dual processor machines, and to programs left open by users. 

 

The job allocates memory when it starts, then runs for several hours.
When jobs start to crash on a machine, the whole cluster falls into the
'black hole'. 

 

For example, a single job will start on vm1 of a 2GB machine, and use
about 700MB. If nothing else is running on the machine, a second job
will start on vm2. However, if the user has left too many programs open,
the second job (and all subsequent jobs) will not have enough memory, so
will start and then crash. Job that start on a single processor machine
with 1GB will also crash if the user has left something open. 

 

We have a small pool (about 20 machines) and want to make the best use
of them that we can. There are workarounds but they are less efficient.
For example, we could use only virtual processor 2 on all machines.
Could try something like this to stop the black hole: 'on_exit_remove =
(CurrentTime - JobStartDate) > (10 * 60)', but it might result in a
logfile blowout through resubmitting the same job repeatedly? 

 

Thanks,

Simon

 

--------------------------------------------------------------------

Simon Hoyle

Senior Fisheries Scientist

Stock Assessment and Modelling Section, Oceanic Fisheries Programme

Secretariat of the Pacific Community

BP D5, 98848 Noumea CEDEX, New Caledonia

(Direct): +687 266 776, (office) +687 262 000 xt 455, (Fax) +687 263 818

Web: www.spc.int

 

<<winmail.dat>>