[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Fedora 19 (64 bits) with HTCondor 8.1: schedd crashes....



Hi all,

Yesterday I upgraded my Linux Fedora PC from version 18 to 19.
This PC is configured as an HTCondor Central Master.
The upgrade also upgraded HTCondor on-the-fly from version 7.9.1 to 8.1.0.

Now, the schedd crashes badly, due to a missing /var/lock/condor/local directory.
The file /var/log/condor/SchedLog says:

10/24/13 10:49:06 (pid:25001) ******************************************************
10/24/13 10:49:06 (pid:25001) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
10/24/13 10:49:06 (pid:25001) ** /usr/sbin/condor_schedd
10/24/13 10:49:06 (pid:25001) ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1)
10/24/13 10:49:06 (pid:25001) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON
10/24/13 10:49:06 (pid:25001) ** $CondorVersion: 8.1.0 Jul 15 2013 BuildID: RH-8.1.0-0.2.fc19 PRE-RELEASE-UWCS $
10/24/13 10:49:06 (pid:25001) ** $CondorPlatform: X86_64-Fedora_19 $
10/24/13 10:49:06 (pid:25001) ** PID = 25001
10/24/13 10:49:06 (pid:25001) ** Log last touched 10/24 10:40:25
10/24/13 10:49:06 (pid:25001) ******************************************************
10/24/13 10:49:06 (pid:25001) Using config source: /etc/condor/condor_config
10/24/13 10:49:06 (pid:25001) Using local config sources: 
10/24/13 10:49:06 (pid:25001)    /etc/condor/config.d/00personal_condor.config
10/24/13 10:49:06 (pid:25001)    /etc/condor/config.d/01personal_condor.config
10/24/13 10:49:06 (pid:25001)    /etc/condor/config.d/99flocking.config
10/24/13 10:49:06 (pid:25001) DaemonCore: command socket at <xxx.xxx.xxx.xxx:40832>
10/24/13 10:49:06 (pid:25001) DaemonCore: private command socket at <xxx.xxx.xxx.xxx:40832>
10/24/13 10:49:06 (pid:25001) History file rotation is enabled.
10/24/13 10:49:06 (pid:25001)   Maximum history file size is: 20971520 bytes
10/24/13 10:49:06 (pid:25001)   Number of rotated history files is: 2
10/24/13 10:49:06 (pid:25001) Failed to execute /usr/sbin/condor_shadow.std, ignoring
10/24/13 10:49:07 (pid:25001) About to rotate ClassAd log /var/lib/condor/spool/job_queue.log
10/24/13 10:49:07 (pid:25001) 210.0: JobLeaseDuration remaining: 75
10/24/13 10:49:07 (pid:25001) directory_util::rec_touch_file: Directory /var/lock/condor/local cannot be created (Permission denied) 
10/24/13 10:49:07 (pid:25001) Starting add_shadow_birthdate(210.0)
Stack dump for process 25001 at timestamp 1382579347 (4 frames)
/lib64/libcondor_utils_8_1_0.so(dprintf_dump_stack+0x72)[0x38954e0972]
/lib64/libcondor_utils_8_1_0.so[0x389557b5f7]
/lib64/libc.so.6[0x3891435a90]
[0x7fff58831120]


If I create manually the directory /var/lock/condor/local with the permissions "1777" and restart HTCondor, then schedd does not crash anymore.....

HTCondor version 7.9.1 seems to fall back to /tmp/condorLocks when it cannot find /var/lock/condor/local/
Has this changed in 8.1 or is this a bug?

I found that the file  /etc/tmpfiles.d/condor.conf  needs another  line like:
d /var/lock/condor/local 1777 condor condor -


which is missing right now, but whose fault is this?

Best regards,
Rob.