[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] GAHP error



Hi,

I'm utilizing Condor at an OSG site, and I can successfully submit and run jobs to the site from another, but when I try and submit a job from the broken site, jobs just end up on hold. He's the Gridmanager log, which shows an error starting the GAHP server. I've verified I can manually run $CONDOR_LOCATION/sbin/gt4_gahp and gahp_server without errors.

Any thoughts on what I should try next? How can I find out what file it means when it says "Failed to initialize from file" I tried strace, but I'm not that very good with it, so maybe the answer lies in there.

Things haven't been working right for a little while now. I'm not 100% sure, because I'm not directly responsible for this system, but I believe the last change that was made was a third NIC was installed. The IP address mentioned in the log file is bound to the interface that is on the private LAN with the rest of the cluster nodes. I don't really think this is what the problem is, but if I really knew, I wouldn't be asking the mailing list. :D
--Peter

6/17 12:28:15 ******************************************************
6/17 12:28:15 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
6/17 12:28:15 ** /osg/programs/condor/sbin/condor_gridmanager
6/17 12:28:15 ** SubsystemInfo: name=GRIDMANAGER type=DAEMON(10) class=DAEMON(1) 6/17 12:28:15 ** Configuration: subsystem:GRIDMANAGER local:<NONE> class:DAEMON
6/17 12:28:15 ** $CondorVersion: 7.2.2 Apr  9 2009 BuildID: 145189 $
6/17 12:28:15 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
6/17 12:28:15 ** PID = 3661
6/17 12:28:15 ** Log last touched 6/17 12:22:38
6/17 12:28:15 ******************************************************
6/17 12:28:15 Using config source: /osg/programs/condor/etc/ condor_config
6/17 12:28:15 Using local config sources:
6/17 12:28:15    /scratch/condor/condor_config.local
6/17 12:28:15 DaemonCore: Command Socket at <10.0.128.2:10406>
6/17 12:28:18 [3661] Found job 20329.0 --- inserting
6/17 12:28:18 [3661] Found job 20329.1 --- inserting
6/17 12:28:18 [3661] Found job 20329.2 --- inserting
6/17 12:28:18 [3661] Found job 20329.3 --- inserting
6/17 12:28:18 [3661] gahp server not up yet, delaying ping
6/17 12:28:18 [3661] GAHP server not initialized yet, not submitting grid_monitor now 6/17 12:28:18 [3661] (20329.0) doEvaluateState called: gmState GM_INIT, globusState 32
6/17 12:28:18 [3661] GAHP server pid = 3672
6/17 12:28:18 [3661] GAHP command 'INITIALIZE_FROM_FILE' failed: 7
6/17 12:28:18 [3661] GAHP: Failed to initialize from file
6/17 12:28:18 [3661] (20329.0) Error initializing GAHP
6/17 12:28:18 [3661] (20329.1) doEvaluateState called: gmState GM_INIT, globusState 32
6/17 12:28:18 [3661] GAHP command 'INITIALIZE_FROM_FILE' failed: 7
6/17 12:28:18 [3661] GAHP: Failed to initialize from file
6/17 12:28:18 [3661] (20329.1) Error initializing GAHP
6/17 12:28:18 [3661] (20329.2) doEvaluateState called: gmState GM_INIT, globusState 32
6/17 12:28:18 [3661] GAHP command 'INITIALIZE_FROM_FILE' failed: 7
6/17 12:28:18 [3661] GAHP: Failed to initialize from file
6/17 12:28:18 [3661] (20329.2) Error initializing GAHP
6/17 12:28:18 [3661] (20329.3) doEvaluateState called: gmState GM_INIT, globusState 32
6/17 12:28:18 [3661] GAHP command 'INITIALIZE_FROM_FILE' failed: 7
6/17 12:28:18 [3661] GAHP: Failed to initialize from file
6/17 12:28:18 [3661] (20329.3) Error initializing GAHP
6/17 12:28:23 [3661] gahp server not up yet, delaying ping
6/17 12:28:23 [3661] GAHP server not initialized yet, not submitting grid_monitor now
6/17 12:28:23 [3661] No jobs left, shutting down
6/17 12:28:23 [3661] Got SIGTERM. Performing graceful shutdown.