[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Fedora 19 (64 bits) with HTCondor 8.1: schedd crashes....



Hi,

Alas, the apparent 'good news' is false.
On Fedora 19 with HTCondor 8.1.0, the schedd keeps crashing upon restarts by the master daemon
(it seems to have nothing to do with the missing  /var/lock/condor/local/  directory).

See below for SchedLog.

Regards,
Rob.

10/24/13 12:54:03 (pid:16146) ******************************************************
10/24/13 12:54:03 (pid:16146) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
10/24/13 12:54:03 (pid:16146) ** /usr/sbin/condor_schedd
10/24/13 12:54:03 (pid:16146) ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1)
10/24/13 12:54:03 (pid:16146) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON
10/24/13 12:54:03 (pid:16146) ** $CondorVersion: 8.1.0 Jul 15 2013 BuildID: RH-8.1.0-0.2.fc19 PRE-RELEASE-UWCS $
10/24/13 12:54:03 (pid:16146) ** $CondorPlatform: X86_64-Fedora_19 $
10/24/13 12:54:03 (pid:16146) ** PID = 16146
10/24/13 12:54:03 (pid:16146) ** Log last touched 10/24 12:20:21
10/24/13 12:54:03 (pid:16146) ******************************************************
10/24/13 12:54:03 (pid:16146) Using config source: /etc/condor/condor_config
10/24/13 12:54:03 (pid:16146) Using local config sources: 
10/24/13 12:54:03 (pid:16146)    /etc/condor/config.d/00personal_condor.config
10/24/13 12:54:03 (pid:16146)    /etc/condor/config.d/01personal_condor.config
10/24/13 12:54:03 (pid:16146)    /etc/condor/config.d/99flocking.config
10/24/13 12:54:03 (pid:16146) DaemonCore: command socket at <xxx.xxx.xxx.xxx:55786>
10/24/13 12:54:03 (pid:16146) DaemonCore: private command socket at <xxx.xxx.xxx.xxx:55786>
10/24/13 12:54:03 (pid:16146) History file rotation is enabled.
10/24/13 12:54:03 (pid:16146)   Maximum history file size is: 20971520 bytes
10/24/13 12:54:03 (pid:16146)   Number of rotated history files is: 2
10/24/13 12:54:03 (pid:16146) Failed to execute /usr/sbin/condor_shadow.std, ignoring
10/24/13 12:54:04 (pid:16146) About to rotate ClassAd log /var/lib/condor/spool/job_queue.log
10/24/13 12:54:04 (pid:16146) 210.0: JobLeaseDuration remaining: EXPIRED!
10/24/13 12:54:08 (pid:16146) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
10/24/13 12:54:08 (pid:16146) TransferQueueManager upload 1m I/O load: 0 bytes/s  0.000 disk load  0.000 net load
10/24/13 12:54:08 (pid:16146) TransferQueueManager download 1m I/O load: 0 bytes/s  0.000 disk load  0.000 net load
10/24/13 12:54:08 (pid:16146) Sent ad to central manager for myname@xxxxxxxxxxxxxxx
10/24/13 12:54:08 (pid:16146) Sent ad to 1 collectors for myname@xxxxxxxxxxxxxxx
10/24/13 12:55:04 (pid:16146) WARNING: forward resolution of condormaster.skku.edu doesn't match xxx.xxx.xxx.xxx!
10/24/13 12:55:04 (pid:16146) Using negotiation protocol: NEGOTIATE
10/24/13 12:55:04 (pid:16146) Negotiating for owner: myname@xxxxxxxxxxxxxxx
10/24/13 12:55:04 (pid:16146) AutoCluster:config() significant attributes changed to 
10/24/13 12:55:05 (pid:16146) Checking consistency running and runnable jobs
10/24/13 12:55:05 (pid:16146) Tables are consistent
10/24/13 12:55:05 (pid:16146) Rebuilt prioritized runnable job list in 0.483s.
10/24/13 12:55:05 (pid:16146) Finished negotiating for myname in local pool: 0 matched, 1 rejected
10/24/13 12:55:05 (pid:16146) Increasing flock level for myname to 1 from 0.
10/24/13 12:55:05 (pid:16146) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
10/24/13 12:55:05 (pid:16146) TransferQueueManager upload 1m I/O load: 0 bytes/s  0.000 disk load  0.000 net load
10/24/13 12:55:05 (pid:16146) TransferQueueManager download 1m I/O load: 0 bytes/s  0.000 disk load  0.000 net load
10/24/13 12:55:05 (pid:16146) Sent ad to central manager for myname@xxxxxxxxxxxxxxx
10/24/13 12:55:05 (pid:16146) Sent ad to 1 collectors for myname@xxxxxxxxxxxxxxx
10/24/13 12:55:07 (pid:16146) Using negotiation protocol: NEGOTIATE
10/24/13 12:55:07 (pid:16146) Negotiating for owner: myname@xxxxxxxxxxxxxxx (flock level 1, pool condor.skku.edu)
10/24/13 12:55:07 (pid:16146) AutoCluster:config() significant attributes changed to JobUniverse,LastCheckpointPlatform,NumCkpts,RemoteGroup,SubmitterGroup,SubmitterUserPrio
10/24/13 12:55:07 (pid:16146) Starting add_shadow_birthdate(210.0)
Stack dump for process 16146 at timestamp 1382586907 (4 frames)
/lib64/libcondor_utils_8_1_0.so(dprintf_dump_stack+0x72)[0x38954e0972]
/lib64/libcondor_utils_8_1_0.so[0x389557b5f7]
/lib64/libc.so.6[0x3891435a90]
[0x7fff76a01aa0]
10/24/13 12:55:18 (pid:17005) ******************************************************
10/24/13 12:55:18 (pid:17005) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
10/24/13 12:55:18 (pid:17005) ** /usr/sbin/condor_schedd
10/24/13 12:55:18 (pid:17005) ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1)
10/24/13 12:55:18 (pid:17005) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON
10/24/13 12:55:18 (pid:17005) ** $CondorVersion: 8.1.0 Jul 15 2013 BuildID: RH-8.1.0-0.2.fc19 PRE-RELEASE-UWCS $
10/24/13 12:55:18 (pid:17005) ** $CondorPlatform: X86_64-Fedora_19 $
10/24/13 12:55:18 (pid:17005) ** PID = 17005
10/24/13 12:55:18 (pid:17005) ** Log last touched 10/24 12:55:08
10/24/13 12:55:18 (pid:17005) ******************************************************
10/24/13 12:55:18 (pid:17005) Using config source: /etc/condor/condor_config
10/24/13 12:55:18 (pid:17005) Using local config sources: 
10/24/13 12:55:18 (pid:17005)    /etc/condor/config.d/00personal_condor.config
10/24/13 12:55:18 (pid:17005)    /etc/condor/config.d/01personal_condor.config
10/24/13 12:55:18 (pid:17005)    /etc/condor/config.d/99flocking.config
10/24/13 12:55:18 (pid:17005) DaemonCore: command socket at <xxx.xxx.xxx.xxx:56785>
10/24/13 12:55:18 (pid:17005) DaemonCore: private command socket at <xxx.xxx.xxx.xxx:56785>
10/24/13 12:55:18 (pid:17005) History file rotation is enabled.
10/24/13 12:55:18 (pid:17005)   Maximum history file size is: 20971520 bytes
10/24/13 12:55:18 (pid:17005)   Number of rotated history files is: 2
10/24/13 12:55:18 (pid:17005) Failed to execute /usr/sbin/condor_shadow.std, ignoring
10/24/13 12:55:18 (pid:17005) About to rotate ClassAd log /var/lib/condor/spool/job_queue.log
10/24/13 12:55:19 (pid:17005) 210.0: JobLeaseDuration remaining: 1188
10/24/13 12:55:19 (pid:17005) Starting add_shadow_birthdate(210.0)
Stack dump for process 17005 at timestamp 1382586919 (4 frames)
/lib64/libcondor_utils_8_1_0.so(dprintf_dump_stack+0x72)[0x38954e0972]
/lib64/libcondor_utils_8_1_0.so[0x389557b5f7]
/lib64/libc.so.6[0x3891435a90]
[0x7fff4bcc51f0]
10/24/13 12:55:30 (pid:17014) ******************************************************
10/24/13 12:55:30 (pid:17014) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
10/24/13 12:55:30 (pid:17014) ** /usr/sbin/condor_schedd
10/24/13 12:55:30 (pid:17014) ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1)
10/24/13 12:55:30 (pid:17014) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON
10/24/13 12:55:30 (pid:17014) ** $CondorVersion: 8.1.0 Jul 15 2013 BuildID: RH-8.1.0-0.2.fc19 PRE-RELEASE-UWCS $
10/24/13 12:55:30 (pid:17014) ** $CondorPlatform: X86_64-Fedora_19 $
10/24/13 12:55:30 (pid:17014) ** PID = 17014
10/24/13 12:55:30 (pid:17014) ** Log last touched 10/24 12:55:19
10/24/13 12:55:30 (pid:17014) ******************************************************
10/24/13 12:55:30 (pid:17014) Using config source: /etc/condor/condor_config
10/24/13 12:55:30 (pid:17014) Using local config sources: 
10/24/13 12:55:30 (pid:17014)    /etc/condor/config.d/00personal_condor.config
10/24/13 12:55:30 (pid:17014)    /etc/condor/config.d/01personal_condor.config
10/24/13 12:55:30 (pid:17014)    /etc/condor/config.d/99flocking.config
10/24/13 12:55:30 (pid:17014) DaemonCore: command socket at <xxx.xxx.xxx.xxx:59733>
10/24/13 12:55:30 (pid:17014) DaemonCore: private command socket at <xxx.xxx.xxx.xxx:59733>
10/24/13 12:55:30 (pid:17014) History file rotation is enabled.
10/24/13 12:55:30 (pid:17014)   Maximum history file size is: 20971520 bytes
10/24/13 12:55:30 (pid:17014)   Number of rotated history files is: 2
10/24/13 12:55:30 (pid:17014) Failed to execute /usr/sbin/condor_shadow.std, ignoring
10/24/13 12:55:31 (pid:17014) About to rotate ClassAd log /var/lib/condor/spool/job_queue.log
10/24/13 12:55:31 (pid:17014) 210.0: JobLeaseDuration remaining: 1176
10/24/13 12:55:31 (pid:17014) Starting add_shadow_birthdate(210.0)
Stack dump for process 17014 at timestamp 1382586931 (4 frames)
/lib64/libcondor_utils_8_1_0.so(dprintf_dump_stack+0x72)[0x38954e0972]
/lib64/libcondor_utils_8_1_0.so[0x389557b5f7]
/lib64/libc.so.6[0x3891435a90]
[0x7ffff37c2d30]