I noticed some of my worker nodes never showed up in condor_status after creating them.
Doing a pstree on the nodes shows that startd wasn't running. I attempted to start it and encountered the following situation.
03/10/15 08:38:05 Can't open "/var/log/condor/StartLog"
ERROR "Cannot open log file '/var/log/condor/StartLog'" at line 208 in file /slots/01/dir_21000/userdir/src/condor_utils/dprintf_setup.cpp
So I temporarily renamed the file and I'm now getting the following in the StartLog.
03/10/15 08:24:38 ERROR: SharedPortEndpoint: failed to bind to /var/lock/condor/daemon_sock/25689_90ae: Permission denied
03/10/15 08:24:38 ERROR "Failed to start local listener (USE_SHARED_PORT=true)" at line 2897 in file /slots/01/dir_21000/userdir/src/condor_daemon_core.V6/daemon_core.cpp
I'm using Puppet to configure htcondor so it doesn't appear to be a differing config between successful worker nodes and this.