Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Job scheduling

Date: Wed, 6 Aug 2008 14:43:53 -0400
From: Patrick Ford <patrick.ford88@xxxxxxxxx>
Subject: [Condor-users] Job scheduling

Hi all,

I run a 20 node cluster (160 CPU, 2GB RAM each cpu) and am having anissue with the way condor distributes jobs across the cluster.

A user is launching simulations that grow to over 6GB in size(Memory), and condor reports it as 15GB (I assume this is Mem+Swap),and if 3 jobs are run on one node, at a certain point in time the nodewill become completely unresponsive. Ganglia shows it as down and sshhangs, but a couple of hours later the condor_startd will crash andrestart and the node becomes responsive again. I assume this is due tothe memory being saturated.

While the job is being run outside operating parameters (6GB >> 2GB),the jobs still have to be run, and they run fine if there is only onebeing run per node. The problem is, all of the jobs are being flockedtogether to one node (compute-1-0 or compute-2-0), is this an intendedfunction of condor, or is there a way I can configure condor toscatter the jobs across the cluster whenever possible?


-Patrick

Follow-Ups:
- Re: [Condor-users] Job scheduling
  - From: Ian Chesal

Prev by Date: Re: [Condor-users] condor_submit feature request
Next by Date: Re: [Condor-users] GCB question
Previous by thread: Re: [Condor-users] condor_submit feature request
Next by thread: Re: [Condor-users] Job scheduling
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] Job scheduling