[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] condor 8.3.5 -> failed to transfer files



Dear all,

We have updated from stable condor version 8.2.3 to the development version 8.3.5 (installing the new condor-all rpm). We were submitting jobs from an ARC-CE to our condor test environment in 8.2.3 version without any important issue. However, we are facing problems with version 8.3.5.

It seems that condor is not finding the libraries placed in /usr/lib64/condor although it is defined as the LIB directory in our condor_config file:

# cat condor_config | egrep "RELEASE|LIB" | grep -v ^#
RELEASE_DIR = /usr
BIN     = $(RELEASE_DIR)/bin
LIB     = $(RELEASE_DIR)/lib64/condor
INCLUDE = $(RELEASE_DIR)/include/condor
SBIN    = $(RELEASE_DIR)/sbin
LIBEXEC = $(RELEASE_DIR)/libexec/condor
SHARE   = $(RELEASE_DIR)/share/condor

In the ShadowLog of the schedd, we can see:

06/23/15 15:48:44 (228.0) (24598): Request to run on slot1@xxxxxxxxxxxx <192.168.101.5:41149> was ACCEPTED
06/23/15 15:48:44 (225.0) (24572): ERROR "Error from slot1@xxxxxxxxxxxx: Failed to transfer files" at line 562 in file /slots/02/dir_64384/userdir/.tmpmTpznX/BUILD/condor-8.3.5/src/condor_shadow.V6.1/pseudo_ops.cpp
06/23/15 15:48:44 (225.0) (24572): ReliSock::put_x509_delegation(): delegation failed: x509_send_delegation failed at line 1422
06/23/15 15:48:45 (225.0) (24572): DoUpload: SHADOW at 193.109.175.11 failed to send file(s) to <192.168.101.5:36996>: error sending /var/spool/arc/jobstatus/job.EOXNDm5ccRmnl6QwDoplpaQmABFKDmABFKDm3hIKDmABFKDm86o53m.proxy
06/23/15 15:48:45 (228.0) (24598): ERROR "Error from slot1@xxxxxxxxxxxx: Failed to transfer files" at line 562 in file /slots/02/dir_64384/userdir/.tmpmTpznX/BUILD/condor-8.3.5/src/condor_shadow.V6.1/pseudo_ops.cpp

These are the errors in the StarterLog of the execute machine:

06/23/15 15:53:00 (pid:9435) ReliSock::get_x509_delegation(): delegation failed: Failed to open GSI libraries: libglobus_common.so.0: cannot open shared object file: No such file or directory
06/23/15 15:53:00 (pid:9435) DoDownload: STARTER at 192.168.101.5 failed to receive file /home/execute/dir_9432/job.IvCMDm9mdRmnl6QwDoplpaQmABFKDmABFKDmIwLKDmABFKDm8ojIzm.proxy
06/23/15 15:53:00 (pid:9432) File transfer failed (status=0).
06/23/15 15:53:00 (pid:9432) ERROR "Failed to transfer files" at line 2301 in file /slots/02/dir_64384/userdir/.tmpmTpznX/BUILD/condor-8.3.5/src/condor_starter.V6.1/jic_shadow.cpp
06/23/15 15:53:00 (pid:9432) ShutdownFast all jobs.
06/23/15 15:53:01 (pid:9432) condor_read() failed: recv(fd=6) returned -1, errno = 104 Connection reset by peer, reading 21 bytes from <193.109.175.11:58180>.
06/23/15 15:53:01 (pid:9432) IO: Failed to read packet header
06/23/15 15:53:01 (pid:9432) Lost connection to shadow, waiting 1200 secs for reconnect

We have been doing some tests, changing the configuration files, playing with LD_LIBRARY_PATH, etc. with no luck. We've found that the libraries are searched in /usr/lib64 directory and not in /usr/lib64/condor that it is the directory where the rpm installed the libraries and the directory defined in our condor_config file.

If we downgrade to version 8.2.3, without touching the configuration, everything is working fine again.

Are we missing something?

Thank you in advance.

Best regards,

Carles
-- 
Carles Acosta i Silva
PIC (Port d'Informació Científica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 22
Fax: +34 93 581 41 10
http://www.pic.es 
Avís - Aviso - Legal Notice: http://www.ifae.es/legal.html