[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MPI : strange globus error, though not using globus



Details : I just figured out that this happens for every job I try to submit to my pool, even non-MPI.

It seems to be an authentication problem, but I don't understand I never had this before. What I recently changed on my pool is that I had te replace the Hard Disk of the central manager, and so to setup the computer again, but I had all data on backups, and everything should be exactly the same...

I'm still using the same NIS/NFS installation, users can still login to each computer, with their ~HOME/ correctly setup...

Any idea of what I forgot ?

Nicolas

----------------
On Thu, 1 Feb 2007 16:02:56 +0100
Nicolas GUIOT <nicolas.guiot@xxxxxxx> wrote:

> Hi
> 
> (FYI, I'm setting up the parallel applications, sorry to flood the list today...)
> 
> So, I setup a dedicated scheduler, and 2 dedicated resources. This is all on a private LAN, nothing to do with globus, condor-g or any other stuff to link my pool to another.
> 
> And now, When I'm submitting my MPI job, I get the following errors : 
> 
> $ condor_submit CondorMpiTest.cmd
> Submitting job(s)
> ERROR: Failed to connect to local queue manager
> AUTHENTICATE:1003:Failed to authenticate with any method
> AUTHENTICATE:1004:Failed to authenticate using GSI
> GSI:5003:Failed to authenticate.  Globus is reporting error (851968:45).  There is probably a problem with your credentials.  (Did you run grid-proxy-init?)
> AUTHENTICATE:1004:Failed to authenticate using KERBEROS
> AUTHENTICATE:1004:Failed to authenticate using FS
> 
> $ ps ax|grep cond
>  7602 ?        Ss     0:02 /nfs/opt/condor_i686/sbin/condor_master
>  7603 ?        Ss     0:00 condor_schedd -f
>  8120 pts/0    S+     0:00 tail -f /scratch/condor/log/SchedLog
>  8129 pts/1    S+     0:00 grep cond
> 
> $ condor_q
> -- Submitter: seurat.lbt.ibpc.fr : <172.27.xx.xx:32795> : seurat.my.domain.fr
>  ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
> 0 jobs; 0 idle, 0 running, 0 held
> ##################################
> 
> -  And I have in the SchedLog : 
> 
> 2/1 15:39:08 (pid:7603) authenticate_self_gss: acquiring self credentials failed. Please check your Condor configuration file if this is a server process. Or the user environment variable if this is a user process.
> 
> GSS Major Status: General failure
> GSS Minor Status Error Chain:
> globus_gsi_gssapi: Error with GSI credential
> globus_gsi_gssapi: Error with gss credential handle
> globus_credential: Valid credentials could not be found in any of the possible locations specified by thecredential search order.
> Valid credentials could not be found in any of the possible locations specified by the credential search order.
> 
> Attempt 1
> 
> globus_credential: Error reading host credential
> globus_sysconfig: Could not find a valid certificate file: The host cert could not be found in:
> 1) env. var. X509_USER_CERT
> 2) /etc/grid-security/hostcert.pem
> 3) $GLOBUS_LOCATION/etc/hostcert.pem
> 4) $HOME/.globus/hostcert.pem
> 
> The host key could not be found in:
> 1) env. var. X509_USER_KEY
> 2) /etc/grid-security/hostkey.pem
> 3) $GLOBUS_LOCATION/etc/hostkey.pem
> 4) $HOME/.globus/hostkey.pem
> 
> 
> 
> Attempt 2
> 
> globus_credential: Error reading proxy credential
> globus_sysconfig: Could not find a valid proxy certificate file location
> globus_sysconfig: Error with key filename
> globus_sysconfig: File does not exist: /tmp/x509up_u0 is not a valid file
> 
> Attempt 3
> 
> globus_credential: Error reading user credential
> globus_sysconfig: Error with certificate filename: The user cert could not be found in:
> 1) env. var. X509_USER_CERT
> 2) $HOME/.globus/usercert.pem
> 3) $HOME/.globus/usercred.p12
> 
> 
> 
> 
> 2/1 15:39:09 (pid:7603) AUTHENTICATE: no available authentication methods succeeded, failing!
> 2/1 15:39:09 (pid:7603) SCHEDD: authentication failed: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using GSI|GSI:5003:Failed to authenticate.  Globus is reporting error (851968:133).  There is probably a problem with your credentials.  (Did you run grid-proxy-init?)|AUTHENTICATE:1004:Failed to authenticate using KERBEROS|AUTHENTICATE:1004:Failed to authenticate usingFS
> 2/1 15:39:09 (pid:7603) IO: Failed to read packet header
> 2/1 15:39:25 (pid:7603) IO: Failed to read packet header
> 
> #####################################
> 
> So, what does this globus/grid/prixy error come to do here ?
> 
> What did I miss ?
> 
> Nicolas


----------------------------------------------------
CNRS - UPR 9080 : Laboratoire de Biochimie Theorique
Institut de Biologie Physico-Chimique
13 rue Pierre et Marie Curie
75005 PARIS - FRANCE

Tel : +33 158 41 51 70
Fax : +33 158 41 50 26
----------------------------------------------------