[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Startd having problems starting becuase it "ran out of resources" but the machine is idle





On Thursday, July 7, 2011 at 3:55 PM, Gregory Skelton wrote:

Hi,

I'm having some problems with a machine that is idle and with plenty of
disk space, but it reports that the startd cannot allocate a slot
because it "ran out of resources."
Does anyone know where condor gets the available resource information?


Thanks in advance,
Best,
Greg

The StartLog entry below:

07/07 13:35:16 ******************************************************
07/07 13:35:16 ** condor_startd (CONDOR_STARTD) STARTING UP
07/07 13:35:16 ** /opt/condor/sbin/condor_startd
07/07 13:35:16 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1)
07/07 13:35:16 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
07/07 13:35:16 ** $CondorVersion: 7.4.2 Mar 29 2010 BuildID: 227044 $
07/07 13:35:16 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
07/07 13:35:16 ** PID = 18997
07/07 13:35:16 ** Log last touched 7/7 13:00:59
07/07 13:35:16 ******************************************************
07/07 13:35:16 Using config source: /etc/condor/condor_config
07/07 13:35:16 Using local config sources:
07/07 13:35:16 /opt/condor/home/condor_config.local
07/07 13:35:16 DaemonCore: Command Socket at <192.168.5.221:47630>
07/07 13:35:16 VM-gahp server reported an internal error
07/07 13:35:16 VM universe will be tested to check if it is available
07/07 13:35:16 History file rotation is enabled.
07/07 13:35:16 Maximum history file size is: 1000000000 bytes
07/07 13:35:16 Number of rotated history files is: 100
07/07 13:35:16 ERROR: Can't allocate 1st slot of type 2
Requesting: Cpus: 1, Memory: 1916, Swap: auto, Disk: auto
Available: Cpus: 1, Memory: 1915, Swap: 100.00%, Disk: 100.00%
07/07 13:35:16 ERROR "Ran out of system resources" at line 614 in file
ResMgr.cpp

This error indicates you've tried to define slot types on the machine that partition the machine resources in such a way that they add up to >100%.

What slot types have you defined? And how many of each type are you trying to create on this machine? It appears that for your slot type 2 definition you shouldn't let Condor auto-set the swap and disk space for the slot. Instead you should set these to 1/N where N is the number of slot type 2 slots you're trying to instantiate (assuming you're only instantiating type 2 slots on the machine and no other type).

Regards,
- Ian

---
Ian Chesal

Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools