[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Submitted jobs remaining idle




Hi Philip,

/opt/condor/etc/condor_config.local (frontend)
-----------------------------------------------
#
#  Condor local configuration file for frontend node.
#
COLLECTOR_NAME = Collector at mseas
CONDOR_ADMIN = condor@xxxxxxxxxxxxx
CONDOR_DEVELOPERS = NONE
CONDOR_DEVELOPERS_COLLECTOR = NONE
CONDOR_HOST = mseas.mit.edu
CONDOR_IDS = 407.407
DAEMON_LIST = MASTER, SCHEDD, COLLECTOR, NEGOTIATOR
EMAIL_DOMAIN = $(FULL_HOSTNAME)
FILESYSTEM_DOMAIN = mit.edu
HOSTALLOW_WRITE = mseas.mit.edu
JAVA = /usr/java/jdk1.5.0_10/bin/java
LOCAL_DIR = /var/opt/condor
LOCK = /tmp/condor-lock.$(HOSTNAME)
MAIL = /bin/mail
NEGOTIATOR_INTERVAL = 120
NETWORK_INTERFACE = 18.38.0.138
RELEASE_DIR = /opt/condor
UID_DOMAIN = mit.edu
-----------------------------------------------

rocks list host interface mseas
-----------------------------------------------
SUBNET  IFACE MAC               IP          NETMASK     GATEWAY   MODULE NAME
private bond0 00:50:45:5f:4e:46 10.1.1.1    255.0.0.0   --------- tg3    mseas
public eth1 00:50:45:5f:4e:47 18.38.0.138 255.255.0.0 18.38.0.1 tg3 mseas.mit.edu
-----------------------------------------------

rocks list network
-----------------------------------------------
NETWORK  SUBNET    NETMASK
private: 10.0.0.0  255.0.0.0
public:  18.38.0.0 255.255.0.0
-----------------------------------------------

rocks list var | grep Condor
-----------------------------------------------
Condor    Master                  mseas.mit.edu
-----------------------------------------------

/opt/condor/etc/condor_config.local (compute node)
-----------------------------------------------
#
#  Condor local configuration file for compute node.
#
CONDOR_ADMIN = condor@xxxxxxxxxxxxx
CONDOR_DEVELOPERS = NONE
CONDOR_DEVELOPERS_COLLECTOR = NONE
CONDOR_HOST = mseas.mit.edu
CONDOR_IDS = 407.407
DAEMON_LIST = MASTER, SCHEDD, STARTD
EMAIL_DOMAIN = $(FULL_HOSTNAME)
FILESYSTEM_DOMAIN = mit.edu
HOSTALLOW_WRITE = mseas.mit.edu, *.local
JAVA = /usr/java/jdk1.5.0_10/bin/java
LOCAL_DIR = /var/opt/condor
LOCK = /tmp/condor-lock.$(HOSTNAME)
MAIL = /bin/mail
NEGOTIATOR_INTERVAL = 120
NETWORK_INTERFACE = 10.255.255.202
RELEASE_DIR = /opt/condor
UID_DOMAIN = mit.edu
# First set JAVA_MAXHEAP_ARGUMENT to null, to disable the default of max RAM
JAVA_MAXHEAP_ARGUMENT =
# Now set the argument with the Sun-specific maximum allowable value
JAVA_EXTRA_ARGUMENTS = -Xmx3964m
-----------------------------------------------

Hi Patrick,
what are the contents of /opt/condor/etc/condor_config.local on your
frontend
Please also send the output of :
# rocks list host interface <shortname of your frontend>
# rocks list network
# rocks list var | grep Condor

And from one of your nodes:
the contents of /opt/condor/etc/condor_config.local

Just trying to gather the information that Rocks is using to create your
condor_config.local files and
the contents of those files on both frontend (condor collector) and nodes.

Thanks,
Phil

Tue, Aug 5, 2008 at 9:32 AM, Patrick Haley <phaley@xxxxxxx> wrote:


Hi,

I'm running condor 6.8.5 under Rocks 4.3 (A CentOS version
of linux).  Last Thursday jobs submitted to condor stopped
running and just show up as idle (prior to this, condor had
been running fine for about 1yr).  It almost looks like the
condor daemons on the front-end machine are no longer
communicating with the daemons on the compute nodes.
(Although I can still ping and ssh into the compute nodes.)

I've tried condor_restart on the front-end and all the
compute nodes with no change (also "condor_restart -all" on
the front-end).  I'm at a bit of a loss on how to proceed.

The output from condor_status is blank

The output from "condor_q -better" is blank on the compute nodes
I've tested, but on the front-end the output for all jobs looks like

---
742.006:  Run analysis summary.  Of 0 machines,
     0 are rejected by your job's requirements
     0 reject your job because of their own requirements
     0 match but are serving users with a better priority in the pool
     0 match but reject the job for unknown reasons
     0 match but will not currently preempt their existing job
     0 are available to run your job

WARNING:  Be advised:
  No resources matched request's constraints

WARNING:  Be advised:   Request 742.6 did not match any resource's
constraints

---

The output for "condor_q -ana" on the front-end looks like
(again same for all jobs)
---
742.006:  Run analysis summary.  Of 0 machines,
     0 are rejected by your job's requirements
     0 reject your job because of their own requirements
     0 match but are serving users with a better priority in the pool
     0 match but reject the job for unknown reasons
     0 match but will not currently preempt their existing job
     0 are available to run your job

WARNING:  Be advised:
  No resources matched request's constraints
  Check the Requirements expression below:

Requirements = ((machine != "nas-0-0.local") && (machine !=
"nas-0-1.local") &&
(machine != "nas-0-2.local") && (machine != "pvfs2-io-0-0.local") &&
(machine
!= "mseas.local")) && (Arch == "X86_64") && (OpSys == "LINUX") && (Disk >=
DiskUsage) && ((Memory * 1024) >= ImageSize) && (TARGET.FileSystemDomain ==
MY.FileSystemDomain)


WARNING:  Be advised:   Request 742.6 did not match any resource's
constraints

---

12 jobs; 12 idle, 0 running, 0 held

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  phaley@xxxxxxx
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-222B                   http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/




--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628




-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  phaley@xxxxxxx
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-222B                   http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301