[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Failed to initialize GAHP



Error code 3 looks like an globus error 3 on the remote host to which
you are trying to submit, which means lack of resources, either memory
or disk, on the remote machine.
If you have access to the command line utility globusrun-ws, try
to run that against the remote server and see if you get a globus error
3 as well, I think you will. If so, then the problem is on the remote end.
Steve Timm

------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.

On Thu, 6 Sep 2007, Mustafa R Kilinc wrote:

All right. We found the gridmanager log:

9/6 15:05:54 passwd_cache::cache_uid(): getpwnam("condor") failed: Success

9/6 15:05:54 passwd_cache::cache_uid(): getpwnam("condor") failed: Success

9/6 15:05:54 ******************************************************
9/6 15:05:54 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
9/6 15:05:54 ** /home/mustafa/condor/sbin/condor_gridmanager
9/6 15:05:54 ** $CondorVersion: 6.8.5 May 17 2007 $
9/6 15:05:54 ** $CondorPlatform: I386-LINUX_RHEL3 $
9/6 15:05:54 ** PID = 11012
9/6 15:05:54 ** Log last touched 9/6 14:58:51
9/6 15:05:54 ******************************************************
9/6 15:05:54 Using config source: /home/mustafa/condor/etc/condor_config
9/6 15:05:54 Using local config sources:
9/6 15:05:54    /home/mustafa/condor/home/condor_config.local
9/6 15:05:54 DaemonCore: Command Socket at <0.0.0.0:41874>
9/6 15:05:57 [11012] DaemonCore: Command received via UDP from host
<127.0.0.1:33502>
9/6 15:05:57 [11012] DaemonCore: received command 60000 (DC_RAISESIGNAL),
calling handler (HandleSigCommand())
9/6 15:05:57 [11012] Found job 24.0 --- inserting
9/6 15:05:57 [11012] gahp server not up yet, delaying ping
9/6 15:05:57 [11012] (24.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/6 15:05:57 [11012] GAHP server pid = 11021
9/6 15:05:57 [11012] GAHP command 'GRAM_CALLBACK_ALLOW' failed: an I/O
operation failed error_code=3
9/6 15:05:57 [11012] (24.0) Error enabling GRAM callback, err=3 - an I/O
operation failed
9/6 15:06:02 [11012] No jobs left, shutting down
9/6 15:06:02 [11012] Got SIGTERM. Performing graceful shutdown.
9/6 15:06:02 [11012] **** condor_gridmanager (condor_GRIDMANAGER) EXITING WITH
STATUS 0

Mustafa

On Thursday 06 September 2007 14:56:23 Mustafa R Kilinc wrote:
We do not see gridmanager logs being created in my home directory on the
remote Gram server(tg-login.sdsc.teragrid.org). Do we need to look it
somewhere else?
We have placed SDSC CA certificate files properly in
$(HOME)/.globus/certificates directory of our personal condor master.

mustafa@mustafa:~$ pwd
/home/mustafa
mustafa@mustafa:~$ ls -al .globus/certificates/
total 32
drwxr-xr-x 2 mustafa mustafa  4096 2007-09-06 11:29 .
drwxr-xr-x 3 mustafa mustafa  4096 2007-09-06 11:27 ..
-rw-r--r-- 1 mustafa mustafa  1468 2004-09-08 19:48 3deda549.0
-rw-r--r-- 1 mustafa mustafa   336 2004-10-07 14:10 3deda549.signing_policy
mustafa@mustafa:

Do we need to place these SDSC CA cert files anywhere else ?

Mustafa

On Thursday 06 September 2007 13:59:53 Dan Bradley wrote:
Your gridmanager log may contain clues about why the gahp failed to be
initialized.  One possible reason is that Condor is not finding the
necessary CA certs for your grid proxy.  This happens if you have your
trusted certs installed in a non-standard location, and Condor has not
been configured to know where to find them.

--Dan

Mustafa R Kilinc wrote:
Trying to use condor_glidein and stuck with this problem.
Any suggestions on how I can fix this:

Please see the log below:

mustafa@mustafa:~/condor$ grid-proxy-info
subject  : /C=US/O=SDSC/OU=SDSC/CN=Mustafa
Kilinc/UID=mkilinc/CN=1444578778 issuer   :
/C=US/O=SDSC/OU=SDSC/CN=Mustafa Kilinc/UID=mkilinc
identity : /C=US/O=SDSC/OU=SDSC/CN=Mustafa Kilinc/UID=mkilinc
type     : Proxy draft (pre-RFC) compliant impersonation proxy
strength : 512 bits
path     : /tmp/x509up_u1000
timeleft : 9:44:46

mustafa@mustafa:~/condor$
condor_glidein -arch=linux-sles8-ia64 -setup_jobmanager=jobmanager-fork
tg-login.sdsc.teragrid.org/jobmanager-pbs

Running/verifying Glidein installation and setup...
Submitting Glidein setup job...
Error: the setup job has been put on hold!  Reason given:
Failed to initialize GAHP
Error: failed to run setup script on remote machine

mustafa@mustafa:~/condor$ gt4_gahp
$GahpVersion: 1.6.0 Dec 08 2006 GT4\ GAHP\ (GT-4.0.3) $

mustafa@mustafa:~/condor$

thanks,
Mustafa
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/