[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Submitted jobs remaining idle



All of those look OK to me -- Standard Rocks config.
Perhaps full file systems somewhere?
Or out-of-synch clocks.

-P


On Tue, Aug 5, 2008 at 9:58 AM, Patrick Haley <phaley@xxxxxxx> wrote:

Hi Philip,

/opt/condor/etc/condor_config.local (frontend)
-----------------------------------------------
#
#  Condor local configuration file for frontend node.
#
COLLECTOR_NAME = Collector at mseas
CONDOR_ADMIN = condor@xxxxxxxxxxxxx
CONDOR_DEVELOPERS = NONE
CONDOR_DEVELOPERS_COLLECTOR = NONE
CONDOR_HOST = mseas.mit.edu
CONDOR_IDS = 407.407
DAEMON_LIST = MASTER, SCHEDD, COLLECTOR, NEGOTIATOR
EMAIL_DOMAIN = $(FULL_HOSTNAME)
FILESYSTEM_DOMAIN = mit.edu
HOSTALLOW_WRITE = mseas.mit.edu
JAVA = /usr/java/jdk1.5.0_10/bin/java
LOCAL_DIR = /var/opt/condor
LOCK = /tmp/condor-lock.$(HOSTNAME)
MAIL = /bin/mail
NEGOTIATOR_INTERVAL = 120
NETWORK_INTERFACE = 18.38.0.138
RELEASE_DIR = /opt/condor
UID_DOMAIN = mit.edu
-----------------------------------------------

rocks list host interface mseas
-----------------------------------------------
SUBNET  IFACE MAC               IP          NETMASK     GATEWAY   MODULE NAME
private bond0 00:50:45:5f:4e:46 10.1.1.1    255.0.0.0   --------- tg3    mseas
public  eth1  00:50:45:5f:4e:47 18.38.0.138 255.255.0.0 18.38.0.1 tg3
mseas.mit.edu
-----------------------------------------------

rocks list network
-----------------------------------------------
NETWORK  SUBNET    NETMASK
private: 10.0.0.0  255.0.0.0
public:  18.38.0.0 255.255.0.0
-----------------------------------------------

rocks list var | grep Condor
-----------------------------------------------
Condor    Master                  mseas.mit.edu
-----------------------------------------------

/opt/condor/etc/condor_config.local (compute node)
-----------------------------------------------
#
#  Condor local configuration file for compute node.
#
CONDOR_ADMIN = condor@xxxxxxxxxxxxx
CONDOR_DEVELOPERS = NONE
CONDOR_DEVELOPERS_COLLECTOR = NONE
CONDOR_HOST = mseas.mit.edu
CONDOR_IDS = 407.407
DAEMON_LIST = MASTER, SCHEDD, STARTD
EMAIL_DOMAIN = $(FULL_HOSTNAME)
FILESYSTEM_DOMAIN = mit.edu
HOSTALLOW_WRITE = mseas.mit.edu, *.local
JAVA = /usr/java/jdk1.5.0_10/bin/java
LOCAL_DIR = /var/opt/condor
LOCK = /tmp/condor-lock.$(HOSTNAME)
MAIL = /bin/mail
NEGOTIATOR_INTERVAL = 120
NETWORK_INTERFACE = 10.255.255.202
RELEASE_DIR = /opt/condor
UID_DOMAIN = mit.edu
# First set JAVA_MAXHEAP_ARGUMENT to null, to disable the default of max RAM
JAVA_MAXHEAP_ARGUMENT =
# Now set the argument with the Sun-specific maximum allowable value
JAVA_EXTRA_ARGUMENTS = -Xmx3964m
-----------------------------------------------

> Hi Patrick,
> what are the contents of /opt/condor/etc/condor_config.local on your
> frontend
> Please also send the output of :
> # rocks list host interface <shortname of your frontend>
> # rocks list network
> # rocks list var | grep Condor
>
> And from one of your nodes:
> the contents of /opt/condor/etc/condor_config.local
>
> Just trying to gather the information that Rocks is using to create your
> condor_config.local files and
> the contents of those files on both frontend (condor collector) and nodes.
>
> Thanks,
> Phil
>
> Tue, Aug 5, 2008 at 9:32 AM, Patrick Haley <phaley@xxxxxxx> wrote:
>
>>
>> Hi,
>>
>> I'm running condor 6.8.5 under Rocks 4.3 (A CentOS version
>> of linux).  Last Thursday jobs submitted to condor stopped
>> running and just show up as idle (prior to this, condor had
>> been running fine for about 1yr).  It almost looks like the
>> condor daemons on the front-end machine are no longer
>> communicating with the daemons on the compute nodes.
>> (Although I can still ping and ssh into the compute nodes.)
>>
>> I've tried condor_restart on the front-end and all the
>> compute nodes with no change (also "condor_restart -all" on
>> the front-end).  I'm at a bit of a loss on how to proceed.
>>
>> The output from condor_status is blank
>>
>> The output from "condor_q -better" is blank on the compute nodes
>> I've tested, but on the front-end the output for all jobs looks like
>>
>> ---
>> 742.006:  Run analysis summary.  Of 0 machines,
>>      0 are rejected by your job's requirements
>>      0 reject your job because of their own requirements
>>      0 match but are serving users with a better priority in the pool
>>      0 match but reject the job for unknown reasons
>>      0 match but will not currently preempt their existing job
>>      0 are available to run your job
>>
>> WARNING:  Be advised:
>>   No resources matched request's constraints
>>
>> WARNING:  Be advised:   Request 742.6 did not match any resource's
>> constraints
>>
>> ---
>>
>> The output for "condor_q -ana" on the front-end looks like
>> (again same for all jobs)
>> ---
>> 742.006:  Run analysis summary.  Of 0 machines,
>>      0 are rejected by your job's requirements
>>      0 reject your job because of their own requirements
>>      0 match but are serving users with a better priority in the pool
>>      0 match but reject the job for unknown reasons
>>      0 match but will not currently preempt their existing job
>>      0 are available to run your job
>>
>> WARNING:  Be advised:
>>   No resources matched request's constraints
>>   Check the Requirements _expression_ below:
>>
>> Requirements = ((machine != "nas-0-0.local") && (machine !=
>> "nas-0-1.local") &&
>> (machine != "nas-0-2.local") && (machine != "pvfs2-io-0-0.local") &&
>> (machine
>> != "mseas.local")) && (Arch == "X86_64") && (OpSys == "LINUX") && (Disk >=
>> DiskUsage) && ((Memory * 1024) >= ImageSize) && (TARGET.FileSystemDomain ==
>> MY.FileSystemDomain)
>>
>>
>> WARNING:  Be advised:   Request 742.6 did not match any resource's
>> constraints
>>
>> ---
>>
>> 12 jobs; 12 idle, 0 running, 0 held
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> Pat Haley                          Email:  phaley@xxxxxxx
>> Center for Ocean Engineering       Phone:  (617) 253-6824
>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
>> MIT, Room 5-222B                   http://web.mit.edu/phaley/www/
>> 77 Massachusetts Avenue
>> Cambridge, MA  02139-4301
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>>
>
>
>
> --
> Philip Papadopoulos, PhD
> University of California, San Diego
> 858-822-3628
>



-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  phaley@xxxxxxx
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-222B                   http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



--
Philip Papadopoulos, PhD
University of California, San Diego
858-822-3628