[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Flocking



Hi,

I'm trying to set up flocking between 2 pools having different UID_DOMAIN and FILESYSTEM_DOMAIN.
I followed the (partially unclear) instructions from the manual '5.2 Connecting Condor Pools with Flocking'
i.e. by setting
--------------------------------
FLOCK_TO =  <manager of pool B> 
--------------------------------
on the submitter of pool A and setting
--------------------------------------------------------------
FLOCK_FROM =  <list of hosts containing submitter of pool A>.
--------------------------------------------------------------
After solving all firewall-issues I submitted a job(-cluster) on the submitter in pool A by:
-----------------------------------------------------------------------------------
condor_submit -remote <manager of B>  -pool <manager of B>  <name of submit-file>
-----------------------------------------------------------------------------------
(when obmitting the '-remote ..' option the job would NEVER flock to B, even if there
were no ressources in A, why?)
This way I finally got some tracks in the logs of the manager of B, namely in 
'/scratch/condor/log/SchedLog':
-----------------------------------------------------------------------------------------------------------------
6/12 12:17:07 (pid:31692) authenticate_self_gss: acquiring self credentials failed. Please check your Condor configuration file if this is a server process. Or the user environment variable if this is a user process.

GSS Major Status: General failure
GSS Minor Status Error Chain:
globus_gsi_gssapi: Error with GSI credential
globus_gsi_gssapi: Error with gss credential handle
globus_credential: Valid credentials could not be found in any of the possible locations specified by the credential search order.
Valid credentials could not be found in any of the possible locations specified by the credential search order.

Attempt 1

globus_credential: Error reading host credential
globus_sysconfig: Could not find a valid certificate file: The host cert could not be found in:
1) env. var. X509_USER_CERT
2) /etc/grid-security/hostcert.pem
3) $GLOBUS_LOCATION/etc/hostcert.pem
4) $HOME/.globus/hostcert.pem

The host key could not be found in:
1) env. var. X509_USER_KEY
2) /etc/grid-security/hostkey.pem
3) $GLOBUS_LOCATION/etc/hostkey.pem
4) $HOME/.globus/hostkey.pem



Attempt 2

globus_credential: Error reading proxy credential
globus_sysconfig: Could not find a valid proxy certificate file location
globus_sysconfig: Error with key filename
globus_sysconfig: File does not exist: /tmp/x509up_u0 is not a valid file

Attempt 3

globus_credential: Error reading user credential
globus_sysconfig: Error with certificate filename: The user cert could not be found in:
1) env. var. X509_USER_CERT
2) $HOME/.globus/usercert.pem
3) $HOME/.globus/usercred.p12




6/12 12:17:07 (pid:31692) AUTHENTICATE: no available authentication methods succeeded, failing!
6/12 12:17:07 (pid:31692) SCHEDD: authentication failed: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using GSI|GSI:5003:Failed to authenticate.  Globus is reporting error (851968:133).  There is probably a problem with your credentials.  (Did you run grid-proxy-init?)|AUTHENTICATE:1004:Failed to authenticate using KERBEROS|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1004:Unable to lstat(/tmp/FS_XXX5hDIkK)
-------------------------------------------------------------------------------------------------------
What happened here? I wonder because in the Flocking chapter in the manual there is no
mentioning of 'credentials', 'authentification' etc...only the reference to 'file-transfer-mechanism'
contains some infos in this direction.
Btw. I got the above log both for vanilla and standard jobs and had
-----------------------------------
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
-----------------------------------
in the submit-file for the vanilla job.

On possibly remarkable thing is that in the (global) config-file for pool A there is the line
-----------------------------------
AUTHENTICATION_METHODS = FS_REMOTE
-----------------------------------
while there is no such thing for pool B.

What else do I need to make flocking from A to B work?

Thanks for any help

Regards

Urs Fitze