[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] CREAM error: Failed to start gahp



Hi,

I am trying to submit condor_g jobs to a CREAM CE, with no success so far.
Here is some info:

Platform (1)
Condor version (2)
Condor configuration related GAHP (3)
Output of running gahp server by hand (4)
Content of job log file (5)
Content of GridmanagerLog file (6)

What am I missing?

Thanks a lot in advance.
Cheers,
Jose


(1)
$ uname -a
Linux grid05.racf.bnl.gov 2.6.32-504.8.1.el6.x86_64 #1 SMP Fri Dec 19
12:09:25 EST 2014 x86_64 x86_64 x86_64 GNU/Linux

(2)
$ rpm -qa | grep condor
condor-classads-8.4.0-1.el6.x86_64
condor-8.4.0-1.el6.x86_64
condor-libs-8.0.4-2.el6.x86_64
condor-cream-gahp-8.4.0-1.el6.x86_64
condor-procd-8.4.0-1.el6.x86_64

$ yum info condor
Loaded plugins: downloadonly, priorities, product-id, rhnplugin,
security, subscription-manager
*Note* Red Hat Network repositories are not listed below. You must run
this command as root to access RHN repositories.
4160 packages excluded due to repository priority protections
Installed Packages
Name        : condor
Arch        : x86_64
Version     : 8.4.0
Release     : 1.el6
Size        : 18 M
Repo        : installed
>From repo   : htcondor-stable
Summary     : HTCondor: High Throughput Computing
URL         : http://www.cs.wisc.edu/condor/
License     : ASL 2.0
Description : HTCondor is a specialized workload management system for
            : compute-intensive jobs. Like other full-featured batch
systems, HTCondor
            : provides a job queueing mechanism, scheduling policy,
priority scheme,
            : resource monitoring, and resource management. Users submit their
            : serial or parallel jobs to HTCondor, HTCondor places
them into a queue,
            : chooses when and where to run the jobs based upon a
policy, carefully
            : monitors their progress, and ultimately informs the user upon
            : completion.

Available Packages
Name        : condor
Arch        : i386
Version     : 8.0.6
Release     : 225363
Size        : 18 M
Repo        : htcondor-stable
Summary     : Condor: High Throughput Computing
URL         : http://www.cs.wisc.edu/condor/
License     : Apache License, Version 2.0
Description : Condor is a specialized workload management system for
            : compute-intensive jobs. Like other full-featured batch systems,
            : Condor provides a job queueing mechanism, scheduling policy,
            : priority scheme, resource monitoring, and resource management.
            : Users submit their serial or parallel jobs to Condor,
Condor places
            : them into a queue, chooses when and where to run the jobs based
            : upon a policy, carefully monitors their progress, and ultimately
            : informs the user upon completion.

Name        : condor
Arch        : i686
Version     : 8.4.0
Release     : 1.el6
Size        : 6.3 M
Repo        : htcondor-stable
Summary     : HTCondor: High Throughput Computing
URL         : http://www.cs.wisc.edu/condor/
License     : ASL 2.0
Description : HTCondor is a specialized workload management system for
            : compute-intensive jobs. Like other full-featured batch
systems, HTCondor
            : provides a job queueing mechanism, scheduling policy,
priority scheme,
            : resource monitoring, and resource management. Users submit their
            : serial or parallel jobs to HTCondor, HTCondor places
them into a queue,
            : chooses when and where to run the jobs based upon a
policy, carefully
            : monitors their progress, and ultimately informs the user upon
            : completion.


(3)
$ condor_config_val -dump | grep GAHP
BATCH_GAHP = $(GLITE_LOCATION)/bin/batch_gahp
BATCH_GAHP_CHECK_STATUS_ATTEMPTS = 5
C_GAHP_LOCK = /tmp/CGAHPLock.$(USERNAME)
C_GAHP_LOG = /tmp/CGAHPLog.$(USERNAME)
C_GAHP_WORKER_THREAD_LOCK = /tmp/CGAHPWorkerLock.$(USERNAME)
C_GAHP_WORKER_THREAD_LOG = /tmp/CGAHPWorkerLog.$(USERNAME)
CONDOR_GAHP = $(SBIN)/condor_c-gahp
CONDOR_GAHP_WORKER = $(SBIN)/condor_c-gahp_worker_thread
CREAM_GAHP = $(SBIN)/cream_gahp
DELTACLOUD_GAHP = $(SBIN)/deltacloud_gahp
EC2_GAHP = $(SBIN)/ec2_gahp
EC2_GAHP_LOG = /tmp/EC2GahpLog.$(USERNAME)
GAHP =
GAHP_ARGS =
GAHP_DEBUG_HIDE_SENSITIVE_DATA = true
GCE_GAHP = $(SBIN)/gce_gahp
GCE_GAHP_LOG = /tmp/GceGahpLog.$(USERNAME)
GRIDMANAGER_GAHP_CALL_TIMEOUT_CONDOR = 28800
GRIDMANAGER_GAHPCLIENT_DEBUG = true
GRIDMANAGER_GAHPCLIENT_DEBUG_SIZE = 0
GT2_GAHP = $(SBIN)/gahp_server
MAX_C_GAHP_LOG = $(MAX_DEFAULT_LOG)
MAX_VM_GAHP_LOG = $(MAX_DEFAULT_LOG)
NORDUGRID_GAHP = $(SBIN)/nordugrid_gahp
REMOTE_GAHP = $(SBIN)/remote_gahp
UNICORE_GAHP = $(SBIN)/unicore_gahp
VM_GAHP_CONFIG =
VM_GAHP_LOG = $(LOG)/VMGahpLog
VM_GAHP_REQ_TIMEOUT = 300
VM_GAHP_SEND_ALL_CLASSAD = true
VM_GAHP_SERVER = $(SBIN)/condor_vm-gahp

(4)
$ /usr/sbin/cream_gahp
$GahpVersion: CREAM Sep 11 2015 UW\ Gahp $
^C

(5)
000 (9483.000.000) 09/30 11:02:34 Job submitted from host: <xxxx>
...
009 (9483.000.000) 09/30 11:02:48 Job was aborted by the user.
    CREAM error: Failed to start gahp
...

(6)
09/30/15 11:02:34 I am: hostname: grid05, fully qualified doman name: xxxxx
09/30/15 11:02:34 I am: hostname: grid05, fully qualified doman name: xxxxx
09/30/15 11:02:34 ******************************************************
09/30/15 11:02:34 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
09/30/15 11:02:34 ** /usr/sbin/condor_gridmanager
09/30/15 11:02:34 ** SubsystemInfo: name=GRIDMANAGER type=DAEMON(12)
class=DAEMON(1)
09/30/15 11:02:34 ** Configuration: subsystem:GRIDMANAGER local:<NONE>
class:DAEMON
09/30/15 11:02:34 ** $CondorVersion: 8.4.0 Sep 11 2015 BuildID: 341253 $
09/30/15 11:02:34 ** $CondorPlatform: X86_64-RedHat_6.6 $
09/30/15 11:02:34 ** PID = 10817
09/30/15 11:02:34 ** Log last touched 9/30 10:49:57
09/30/15 11:02:34 ******************************************************
09/30/15 11:02:34 Using config source: /etc/condor/condor_config
09/30/15 11:02:34 Using local config sources:
09/30/15 11:02:34    /etc/condor/condor_config.local
09/30/15 11:02:34 config Macros = 61, Sorted = 61, StringBytes = 1612,
TablesBytes = 2236
09/30/15 11:02:34 CLASSAD_CACHING is ENABLED
09/30/15 11:02:34 Daemon Log is logging: D_ALWAYS D_ERROR
09/30/15 11:02:34 Daemoncore: Listening at <0.0.0.0:19884> on TCP
(ReliSock) and UDP (SafeSock).
09/30/15 11:02:34 DaemonCore: command socket at <xxxx>
09/30/15 11:02:34 DaemonCore: private command socket at <xxxx>
09/30/15 11:02:37 [10817] Found job 9483.0 --- inserting
09/30/15 11:02:37 [10817] gahp server not up yet, delaying ping
09/30/15 11:02:37 [10817] gahp server not up yet, delaying checkDelegation
09/30/15 11:02:37 [10817] BaseResource::DoBatchStatus: gahp server not
up yet, delaying 5 seconds
09/30/15 11:02:37 [10817] (9483.0) doEvaluateState called: gmState
GM_INIT, creamState
09/30/15 11:02:37 [10817] GAHP server pid = 10823
09/30/15 11:02:41 [10817] (9483.0) doEvaluateState called: gmState
GM_DELEGATE_PROXY, creamState
09/30/15 11:02:44 [10817] resource
https://ce403.cern.ch:8443/ce-cream/services/CREAM2 is now up
09/30/15 11:02:44 [10817] (9483.0) doEvaluateState called: gmState
GM_SET_LEASE, creamState
09/30/15 11:02:44 [10817] (9483.0) doEvaluateState called: gmState
GM_SET_LEASE, creamState
09/30/15 11:02:46 [10817] (9483.0) doEvaluateState called: gmState
GM_SUBMIT, creamState
09/30/15 11:02:48 [10817] (9483.0) doEvaluateState called: gmState
GM_SUBMIT_SAVE, creamState
09/30/15 11:02:48 [10817] GAHP server pid = 10840
09/30/15 11:02:48 [10817] Failed to read GAHP server version
09/30/15 11:02:48 [10817] (9483.0) doEvaluateState called: gmState
GM_STAGE_IN, creamState
09/30/15 11:02:48 [10817] (9483.0) Stage-in failed: Failed to start gahp
09/30/15 11:02:48 [10817] Gahp Server (pid=10840) exited with status
127 unexpectedly
09/30/15 11:02:53 [10817] No jobs left, shutting down
09/30/15 11:02:55 [10817] Got SIGTERM. Performing graceful shutdown.
09/30/15 11:02:55 [10817] **** condor_gridmanager (condor_GRIDMANAGER)
pid 10817 EXITING WITH STATUS 0