[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problem with credential delegation



Hello,
I've been stuck with this problem for a while now.

I'm trying to submit a simple job to a condor pool. When the job is submitted without specifying a user proxy, then everything goes fine. But when I submit with the x509userproxy value set to my proxy (/tmp/x509up_u20200), then the job sits in the queue and never gets started. The only thing I could find by inspecting the system and condor logs is the following in the ShadowLog:

08/05/10 14:43:32 ******************************************************
08/05/10 14:43:32 ** condor_shadow (CONDOR_SHADOW) STARTING UP
08/05/10 14:43:32 ** /usr/local/condor/sbin/condor_shadow
08/05/10 14:43:32 ** SubsystemInfo: name=SHADOW type=SHADOW(6) class=DAEMON(1)
08/05/10 14:43:32 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON
08/05/10 14:43:32 ** $CondorVersion: 7.5.2 Jun 18 2010 $
08/05/10 14:43:32 ** $CondorPlatform: X86_64-LINUX_SuSE_UNKNOWN $
08/05/10 14:43:32 ** PID = 12042
08/05/10 14:43:32 ** Log last touched 8/5 14:43:31
08/05/10 14:43:32 ******************************************************
08/05/10 14:43:32 Using config source: /usr/local/condor/etc/condor_config
08/05/10 14:43:32 Using local config sources: 08/05/10 14:43:32 /usr/local/condor/local.vm129/condor_config.local
08/05/10 14:43:32 DaemonCore: command socket at <***.***.***.29:40011>
08/05/10 14:43:32 Initializing a VANILLA shadow for job 111.0
08/05/10 14:43:32 (111.0) (12042): Request to run on vm132.***.***.*** <***.***.***.32:40003?CCBID=***.***.***.29:9618#588> was ACC
EPTED
08/05/10 14:43:32 (111.0) (12042): ReliSock::put_x509_delegation(): delegation failed: x509_send_delegation failed at line 1074
08/05/10 14:44:02 (111.0) (12042): DoUpload: SHADOW at ***.***.***.29 failed to send file(s) to <***.***.***.32:40020>: error sendin
g /tmp/x509up_u20200; STARTER at ***.***.***.32 failed to receive file /var/lib/condor/execute/dir_1431/x509up_u20200
08/05/10 14:44:02 (111.0) (12042): ERROR "Error from vm132.***.***.***: Failed to transfer files" at line 655 in file pseudo_ops.cp
p


It looks like the ReliSock::put_x509_delegation() function is failing. But unfortunately, I've not been able to figure out why it is failing.

Did anyone experience this type of error before? Any hints or ideas on how I can debug this further to find out what is causing this problem?

I'm running the following:
$CondorVersion: 7.5.2 Jun 18 2010 $
$CondorPlatform: X86_64-LINUX_SuSE_UNKNOWN $
Linux vm129 2.6.27.29-0.1-xen #1 SMP 2009-08-15 17:53:59 +0200 x86_64 x86_64 x86_64 GNU/Linux
(Software firewall disabled)



Thanks,
 Andre