[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problem setting up more slots than cpus



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

While attempting to setup a machine to report more slots than it has
cpus, I can't seem to get it to report any slots on the machine at all.

`condor_version` gives me:
<----
$CondorVersion: 7.4.4 Oct 13 2010 BuildID: 279383 $
$CondorPlatform: I386-LINUX_RHEL3 $
- ---->

With D_FULLDEBUG on for this machine, I get the following output from
the SchedLog file and the StartLog file:
<----
03/08 17:22:58 Getting monitoring info for pid 6410
03/08 17:22:58 DaemonCore: in SendAliveToParent()
03/08 17:23:16 condor_read(): timeout reading 5 bytes from
<172.16.75.110:33505>.
03/08 17:23:16 IO: Failed to read packet header
03/08 17:23:16 Failed to read ClassAd size.
03/08 17:23:16 SECMAN: no classad from server, failing
03/08 17:23:16 ERROR: SECMAN:2004:Failed to create security session to
<172.16.75.110:33505> with TCP.
|SECMAN:2007:Failed to end classad message.
03/08 17:23:16 DaemonCore: startCommand() to <172.16.75.110:33505>
failed. SendAliveToParent() failed.
03/08 17:23:16 Failed to send alive to <172.16.75.110:33505>, will try
again...
- ---->

What's odd is that the CollectorLog on $(CONDOR_HOST) says that it gets
an INVALIDATE_STARTD_ADS for the two slots I'm trying to set up.

The local config for this machine is so:
<----
# Where are the binaries
RELEASE_DIR = /opt/condor-7.4.4

# How do we send mail?
MAIL = /bin/mail

# What devices do we care about? (none, but I'm not sure if it works if
we don't define this)
CONSOLE_DEVICES = mouse, console

# What daemons should we start?
DAEMON_LIST = MASTER, STARTD, SCHEDD

# Where is our execute directory?
LOCAL_DIR = /opt/condor-7.4.4/local

# TODO Define the default user to act as

# Define more cpus
NUM_CPUS = 2

# Define more slots
#NUM_SLOTS = 2

# Define types of slots
SLOT_TYPE_1 = cpus=1, ram=%50, swap=1/2, disk=1/2
SLOT_TYPE_2 = cpus=1, ram=%50, swap=1/2, disk=1/2

NUM_SLOTS_TYPE_1 = 1
NUM_SLOTS_TYPE_2 = 1

# Debugging
ALL_DEBUG = D_FULLDEBUG
- ---->

This problem doesn't happen when I'm not trying to lie to Condor about
how many cpus I have. Is condor trying to teach me to stop lying, or am
I missing something?

Thank you,
Evan Niessen-Derry
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQIcBAEBAgAGBQJNdr95AAoJEGZ4XOPGmWlkiQIQAKIGi1ILtDwb6C/jDSAolegC
FHtSSfjdN7aFH7Qsy8M4WW9Jh9P5L38SgdoCLbZD1LMBb3suP72denwO4skt70Yj
iU7CnZ3UyLngB35Nl/TbofUTxysduj3eJzoG88ilsTxgdsleOMBhrx2hUHcuXPD4
Te9rcmMKUJu7FfnZe/5oT9g/mifb6uwpOy9QxqcfjzzRtrmM0BRnngq+6t9R9ZMe
x5den0vsVoaJEW89zXFSjoi2yWImRma5jP8rb7n3X8vZt9jZ79dKlVPDfY+vCazB
xKul68eM2zpkOMHeM2relGgDYBnn5sEnNtYSP/Id3i4xysAzjpXwESEQgZCgRKO7
N3TyvsC7z9TrnIU9UYHNX+SF3KVKm6Ae2rN6I9hxvrfu7gUA38BkhU6jEuoHAUW5
Z2zuRlde2O7chD5Hrqw2dJ9MxhdDE0arWAtDVvGGV0jF2cZ4E0wH02K+V6ItKzKS
mvHR1wrZFCRJilZ+vIb9DegUr/KBUoArt+42P0UyZ4Q8DcQMwwYHRPaukEvOqfMe
N52TEXLUro9MLFwbHVgjHoxgXpcACyWNTwiRh6SpGC3zuWGqv13rCYHaAy+/kx3x
GEJNMDcm5QqLHCBxtckLQSrzUbJMUIpL6QUXOuQqAq4I+gJChvUExf1aFrD3BhSC
YgWm2zWz3ExKgVjfyf5Y
=Ii4Z
-----END PGP SIGNATURE-----