[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Flocking - jobs matched but not started



On Fri, 2005-08-19 at 07:33 -0500, Zachary Miller wrote:
> > This is a test setup involving just 3 systems so I can control what is
> > happening a fair bit. I have modified the execution client to use
> > FULLDEBUG for the STARTD, and set SEC_DEBUG_PRINT_KEYS to true (the log
> > file kept mentioning it defaulting to false so I thought I'd change it
> > to see what else it showed). The client startlog shows:
> 
> actually, what would be more useful is to turn on D_SECURITY and D_FULLDEBUG,
> or simply D_ALL.
> 
It seems I am also getting on the local server:

In the MasterLog:

============================================
8/19 16:13:36 The SCHEDD (pid 18970) died due to signal 11
8/19 16:13:36 Sending obituary for
"/opt/condor-6.6.10/sbin/condor_schedd"
8/19 16:13:36 restarting /opt/condor-6.6.10/sbin/condor_schedd in 10
seconds
8/19 16:13:46 Started DaemonCore process
"/opt/condor-6.6.10/sbin/condor_schedd", pid and pgroup = 19066
8/19 16:17:39 The SCHEDD (pid 19066) died due to signal 11
8/19 16:17:39 Sending obituary for
"/opt/condor-6.6.10/sbin/condor_schedd"
8/19 16:17:39 restarting /opt/condor-6.6.10/sbin/condor_schedd in 11
seconds
8/19 16:17:50 Started DaemonCore process
"/opt/condor-6.6.10/sbin/condor_schedd", pid and pgroup = 19125
============================================


The SchedLog shows:

============================================
8/19 16:13:36 DaemonCore: Command received via TCP from host
<141.163.66.135:37695>
8/19 16:13:36 DaemonCore: received command 416 (NEGOTIATE), calling
handler (negotiate)
8/19 16:13:46 ******************************************************
8/19 16:13:46 ** condor_schedd (CONDOR_SCHEDD) STARTING UP
8/19 16:13:46 ** /opt/condor-6.6.10/sbin/condor_schedd
8/19 16:13:46 ** $CondorVersion: 6.6.10 Jun 13 2005 $
8/19 16:13:46 ** $CondorPlatform: I386-LINUX_RH9 $
8/19 16:13:46 ** PID = 19066
8/19 16:13:46 ******************************************************
8/19 16:13:46 Using config file: /opt/condor-6.6.10/etc/condor_config
8/19 16:13:46 Using local config
files: /opt/condor-6.6.10/local.ws-60-56/condor_config.local
8/19 16:13:46 DaemonCore: Command Socket at <141.163.60.56:44914>
8/19 16:13:46 Sent ad to central manager for
john@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
8/19 16:17:06 DaemonCore: Command received via TCP from host
<141.163.60.56:44920>
8/19 16:17:06 DaemonCore: received command 416 (NEGOTIATE), calling
handler (negotiate)
8/19 16:17:06 Negotiating for owner: john@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
8/19 16:17:06 Checking consistency running and runnable jobs
8/19 16:17:06 Tables are consistent
8/19 16:17:06 Out of servers - 0 jobs matched, 5 jobs idle, 1 jobs
rejected
8/19 16:17:06 Increasing flock level for
john@xxxxxxxxxxxxxxxxxxxxxxxxxxxx to 1.
8/19 16:17:06 Sent ad to central manager for
john@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
8/19 16:17:39 DaemonCore: Command received via TCP from host
<141.163.66.135:37719>
8/19 16:17:39 DaemonCore: received command 416 (NEGOTIATE), calling
handler (negotiate)
8/19 16:17:50 ******************************************************
8/19 16:17:50 ** condor_schedd (CONDOR_SCHEDD) STARTING UP
8/19 16:17:50 ** /opt/condor-6.6.10/sbin/condor_schedd
8/19 16:17:50 ** $CondorVersion: 6.6.10 Jun 13 2005 $
8/19 16:17:50 ** $CondorPlatform: I386-LINUX_RH9 $
8/19 16:17:50 ** PID = 19125
8/19 16:17:50 ******************************************************
8/19 16:17:50 Using config file: /opt/condor-6.6.10/etc/condor_config
8/19 16:17:50 Using local config
files: /opt/condor-6.6.10/local.ws-60-56/condor_config.local
8/19 16:17:50 DaemonCore: Command Socket at <141.163.60.56:44922>
8/19 16:17:50 Sent ad to central manager for
john@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
8/19 16:22:07 DaemonCore: Command received via TCP from host
<141.163.60.56:44928>
8/19 16:22:07 DaemonCore: received command 416 (NEGOTIATE), calling
handler (negotiate)
8/19 16:22:07 Negotiating for owner: john@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
8/19 16:22:07 Checking consistency running and runnable jobs
8/19 16:22:07 Tables are consistent
8/19 16:22:07 Out of servers - 0 jobs matched, 5 jobs idle, 1 jobs
rejected
8/19 16:22:07 Increasing flock level for
john@xxxxxxxxxxxxxxxxxxxxxxxxxxxx to 1.
8/19 16:22:07 Sent ad to central manager for
john@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
============================================

I'll see if I can find out anything more, but as said this seems to be
getting worse. I have tried setting 'CREATE_CORE_FILES' but that made no
difference (no core file created).


John.

-- 
---------------------------------------------------------------
John Horne, University of Plymouth, UK  Tel: +44 (0)1752 233914
E-mail: John.Horne@xxxxxxxxxxxxxx       Fax: +44 (0)1752 233839