We have updated from stable condor version 8.2.3 to the development version 8.3.5 (installing the new condor-all rpm). We were submitting jobs from an ARC-CE to our condor test environment in 8.2.3 version without any important issue. However, we are facing problems with version 8.3.5.
It seems that condor is not finding the libraries placed in /usr/lib64/condor although it is defined as the LIB directory in our condor_config file:
# cat condor_config | egrep "RELEASE|LIB" | grep -v ^#
RELEASE_DIR = /usr
BIN = $(RELEASE_DIR)/bin
LIB = $(RELEASE_DIR)/lib64/condor
INCLUDE = $(RELEASE_DIR)/include/condor
SBIN = $(RELEASE_DIR)/sbin
LIBEXEC = $(RELEASE_DIR)/libexec/condor
SHARE = $(RELEASE_DIR)/share/condor
In the ShadowLog of the schedd, we can see:
06/23/15 15:48:44 (228.0) (24598): Request to run on slot1@xxxxxxxxxxxx <192.168.101.5:41149> was ACCEPTED
06/23/15 15:48:44 (225.0) (24572): ERROR "Error from slot1@xxxxxxxxxxxx: Failed to transfer files" at line 562 in file /slots/02/dir_64384/userdir/.tmpmTpznX/BUILD/condor-8.3.5/src/condor_shadow.V6.1/pseudo_ops.cpp
06/23/15 15:48:44 (225.0) (24572): ReliSock::put_x509_delegation(): delegation failed: x509_send_delegation failed at line 1422
06/23/15 15:48:45 (225.0) (24572): DoUpload: SHADOW at 18.104.22.168 failed to send file(s) to <192.168.101.5:36996>: error sending /var/spool/arc/jobstatus/job.EOXNDm5ccRmnl6QwDoplpaQmABFKDmABFKDm3hIKDmABFKDm86o53m.proxy
06/23/15 15:48:45 (228.0) (24598): ERROR "Error from slot1@xxxxxxxxxxxx: Failed to transfer files" at line 562 in file /slots/02/dir_64384/userdir/.tmpmTpznX/BUILD/condor-8.3.5/src/condor_shadow.V6.1/pseudo_ops.cpp
These are the errors in the StarterLog of the execute machine:
06/23/15 15:53:00 (pid:9435) ReliSock::get_x509_delegation(): delegation failed: Failed to open GSI libraries: libglobus_common.so.0: cannot open shared object file: No such file or directory
06/23/15 15:53:00 (pid:9435) DoDownload: STARTER at 192.168.101.5 failed to receive file /home/execute/dir_9432/job.IvCMDm9mdRmnl6QwDoplpaQmABFKDmABFKDmIwLKDmABFKDm8ojIzm.proxy
06/23/15 15:53:00 (pid:9432) File transfer failed (status=0).
06/23/15 15:53:00 (pid:9432) ERROR "Failed to transfer files" at line 2301 in file /slots/02/dir_64384/userdir/.tmpmTpznX/BUILD/condor-8.3.5/src/condor_starter.V6.1/jic_shadow.cpp
06/23/15 15:53:00 (pid:9432) ShutdownFast all jobs.
06/23/15 15:53:01 (pid:9432) condor_read() failed: recv(fd=6) returned -1, errno = 104 Connection reset by peer, reading 21 bytes from <22.214.171.124:58180>.
06/23/15 15:53:01 (pid:9432) IO: Failed to read packet header
06/23/15 15:53:01 (pid:9432) Lost connection to shadow, waiting 1200 secs for reconnect
We have been doing some tests, changing the configuration files, playing with LD_LIBRARY_PATH, etc. with no luck. We've found that the libraries are searched in /usr/lib64 directory and not in /usr/lib64/condor that it is the directory where the rpm installed the libraries and the directory defined in our condor_config file.
If we downgrade to version 8.2.3, without touching the configuration, everything is working fine again.
Are we missing something?
Thank you in advance.
-- Carles Acosta i Silva PIC (Port d'Informació Científica) Campus UAB, Edifici D E-08193 Bellaterra, Barcelona Tel: +34 93 581 33 22 Fax: +34 93 581 41 10 http://www.pic.es Avís - Aviso - Legal Notice: http://www.ifae.es/legal.html