[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] 'parallel' universe job submission crashes SCHEDD



Hi David,

On Wed, Dec 29, 2010 at 10:22:42AM -0600, David J. Herzfeld wrote:
> There were a couple of suggestions for running these two models
> simultaneously in Condor in a thread I started back in August (some
> require quite a bit more tinkering than the others). See https://lists.cs.wisc.edu/archive/condor-users/2010-August/msg00229.shtml

Thanks for the pointer!

> Right now, I am waiting in joyful anticipation for the closing of
> ticket #986 - see the contents here
> https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=986,0
> 
> In our case, 'RequestCpus' is the important aspect of parallel jobs
> - users want to be able to specify RequestCpus=8, num_machines=2 to
> receive 8 processors per node on two nodes.

That sounds useful indeed. I hope the patch gets applied soon.

> I assume from your problem statement that the memory required per
> process for either the parallel or vanilla jobs is larger than the
> default memory value of 8GB assigned per slot in the
> non-partitionable configuration (64GB total/8 processors per
> machine). Is this correct?

Yes, sometimes we need a multiple of that -- even though it is only a
single cpu job. But at the same time we also have multi-threaded
applications that are relatively gentle in terms of memory consumption.
Having 8GB statically assigned to each slot would be suboptimal in both
cases.

Michael


-- 
Michael Hanke
http://mih.voxindeserto.de