[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] 'parallel' universe job submission crashes SCHEDD
- Date: Wed, 29 Dec 2010 13:51:48 -0500
- From: Michael Hanke <michael.hanke@xxxxxxxxx>
- Subject: Re: [Condor-users] 'parallel' universe job submission crashes SCHEDD
On Wed, Dec 29, 2010 at 10:22:42AM -0600, David J. Herzfeld wrote:
> There were a couple of suggestions for running these two models
> simultaneously in Condor in a thread I started back in August (some
> require quite a bit more tinkering than the others). See https://lists.cs.wisc.edu/archive/condor-users/2010-August/msg00229.shtml
Thanks for the pointer!
> Right now, I am waiting in joyful anticipation for the closing of
> ticket #986 - see the contents here
> In our case, 'RequestCpus' is the important aspect of parallel jobs
> - users want to be able to specify RequestCpus=8, num_machines=2 to
> receive 8 processors per node on two nodes.
That sounds useful indeed. I hope the patch gets applied soon.
> I assume from your problem statement that the memory required per
> process for either the parallel or vanilla jobs is larger than the
> default memory value of 8GB assigned per slot in the
> non-partitionable configuration (64GB total/8 processors per
> machine). Is this correct?
Yes, sometimes we need a multiple of that -- even though it is only a
single cpu job. But at the same time we also have multi-threaded
applications that are relatively gentle in terms of memory consumption.
Having 8GB statically assigned to each slot would be suboptimal in both