[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problem when submitting to condor-g using condor 6.8.4 and gt4



Re the below:

Can you run the "gt4_gahp" program from a command prompt?  It is sitting in Condor's sbin directory.  If it runs you should get a gahp banner with version etc, and just enter "quit" to exit.  If it does not run, hopefully it will give a clue about what is wrong.


< Sent from a Palm Treo 680 >
-----Original Message-----
From: "Ioannis Kampolis" <gkamp@xxxxxxxxxxxx>
Date: Thursday, May 17, 2007 8:38 am
Subject: [Condor-users] Problem when submitting to condor-g using condor 	6.8.4 and gt4
To: <condor-users@xxxxxxxxxxx>Reply-To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>

 
> Hello everyone,
> We have recently begun installing grid tools on our laboratory clusters and we are having problems making Condor-G work with GT 4.0.
> After installing Condor and GT 4.0 we have managed to submit and complete simple jobs with Condor and Globus separately and to Globus using Condor for managing our cluster resources resources. The problem arises when trying to submit a simple job to Condor-G.
> Take for instance the example included in condor for checking the
>environment variables (env.cmd). We modified the submission file as follows:
>####################
>##
>## Test Condor command file
>##
>####################
> executable = env.remote
> Universe = grid
> grid_resource = gt4 
>192.168.0.252:8443/wsrf/services/ManagedJobFactoryService Condor
> output = env.out
> error = env.err
> log = env.log
> Args = "foo bar glarch"
> environment = alpha=a;bravo=b;charlie=c
> queue
> 
> but the job remains Idle indefinitely and the partial Gridmanager log output is:
>5/17 16:16:08 Welcome to the all-singing, all dancing, "amazing" 
> GridManager!
>5/17 16:16:08 [31892] Getting monitoring info for pid 31892
>5/17 16:16:08 [31892] Checking proxies
>5/17 16:16:09 [31892] DaemonCore: in SendAliveToParent()
>5/17 16:16:09 [31892] DaemonCore: attempting to connect to
>''
>5/17 16:16:11 [31892] Received ADD_JOBS signal
>5/17 16:16:11 [31892] in doContactSchedd()
>5/17 16:16:11 [31892] querying for new jobs
>5/17 16:16:11 [31892] Using constraint
>((Owner=?="Panagiotis"&&JobUniverse==9)) && (Managed =!= "ScheddDone") &&(((Matched =!= FALSE) && (JobStatus != 5)) || (Managed =?= "External"))
>5/17 16:16:11 [31892] Using job type GT4 for job 99.0
>5/17 16:16:11 [31892] (99.0) SetJobLeaseTimers()
>5/17 16:16:11 [31892] Found job 99.0 --- inserting
>5/17 16:16:11 [31892] Fetched 1 new job ads from schedd
>5/17 16:16:11 [31892] querying for removed/held jobs
>5/17 16:16:11 [31892] Using constraint
>((Owner=?="Panagiotis"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) &&(JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?=
>"External"))
>5/17 16:16:11 [31892] Fetched 0 job ads from schedd
>5/17 16:16:11 [31892] leaving doContactSchedd()
>5/17 16:16:11 [31892] gahp server not up yet, delaying ping
>5/17 16:16:11 [31892] *** UpdateLeases called
>5/17 16:16:11 [31892] Leases not supported, cancelling timer
>5/17 16:16:11 [31892] *** checkDelegation()
>5/17 16:16:11 [31892] gahp server not up yet, delaying checkDelegation
>5/17 16:16:11 [31892] (99.0) doEvaluateState called: gmState GM_INIT, globusState 32
>5/17 16:16:11 [31892] GAHP server pid = 31893
>5/17 16:16:11 [31892] Failed to read GAHP server version
>5/17 16:16:11 [31892] (99.0) Error initializing GAHP
>5/17 16:16:11 [31892] (99.0) gm state change: GM_INIT -> GM_HOLD
>5/17 16:16:11 [31892] (99.0) Writing hold record to user logfile
>5/17 16:16:11 [31892] (99.0) gm state change: GM_HOLD -> GM_DELETE
>5/17 16:16:11 [31892] DaemonCore: No more children processes to reap.
>5/17 16:16:11 [31892] ERROR "Gahp Server (pid=31893) exited with status 255
>" at line 278 in file gahp-client.C
> we have checked the JAVA configuration as two other similar posts suggested but as far as we can see everything is in order. Any ideas?
> 
> Thank you in advance,
> 
> Giannis Kampolis
> 
> 
>------=_NextPart_000_0004_01C798A0.E939B110
>Content-Type: text/html;
>	charset="us-ascii"
>Content-Transfer-Encoding: quoted-printable
>     
>
>  
>
>Hello=20 everyone, 
>
>We have recently begun installing grid = tools on our=20
>laboratory clusters and we are having problems making Condor-G work with = GT=20
>4.0. 
>
>After installing Condor and GT 4.0 we = have managed to=20
>submit and complete simple jobs with Condor and Globus separately and to = Globus=20
>using Condor for managing our cluster resources resources. The problem = arises=20
>when trying to submit a simple job to Condor-G. 
>
>Take for instance the example included in = condor for=20
>checking the environment variables (env.cmd). We modified the submission = file as=20
>follows: 
>
>#################### 
>
>## 
>
>## Test Condor command file 
>
>## 
>
>#################### 
>
>executable =3D env.remote 
>
>Universe =3D grid 
>
>grid_resource =3D gt4  
>
>192.168.0.252:8443/wsrf/services/ManagedJobFactoryService=20 Condor 
>
>output =3D env.out 
>
>error =3D env.err 
>
>log =3D env.log 
>
>Args =3D "foo bar glarch" 
>
>environment =3D = alpha=3Da;bravo=3Db;charlie=3Dc 
>
>queue 
>
>  
>
>but the job remains Idle indefinitely and = the partial=20
>Gridmanager log output is: 
>
>5/17 16:16:08 Welcome to the all-singing, = all=20
>dancing, "amazing"  
>
>GridManager! 
>
>5/17 16:16:08 [31892] Getting monitoring = info for pid=20
>31892 
>
>5/17 16:16:08 [31892] Checking = proxies 
>
>5/17 16:16:09 [31892] DaemonCore: in=20 SendAliveToParent() 
>
>5/17 16:16:09 [31892] DaemonCore: = attempting to=20
>connect to '<192.168.0.252:41401>' 
>
>5/17 16:16:11 [31892] Received ADD_JOBS=20 signal 
>
>5/17 16:16:11 [31892] in = doContactSchedd() 
>
>5/17 16:16:11 [31892] querying for new=20 jobs 
>
>5/17 16:16:11 [31892] Using = constraint 
>
>((Owner=3D?=3D"Panagiotis"&&JobUniverse=3D=3D9))=20
>&& (Managed =3D!=3D "ScheddDone") && (((Matched =3D!=3D = FALSE)=20
>&& (JobStatus !=3D 5)) || (Managed =3D?=3D =
>"External")) 
>
>5/17 16:16:11 [31892] Using job type GT4 = for job=20
>99.0 
>
>5/17 16:16:11 [31892] (99.0)=20 SetJobLeaseTimers() 
>
>5/17 16:16:11 [31892] Found job 99.0 ---=20 inserting 
>
>5/17 16:16:11 [31892] Fetched 1 new job = ads from=20
>schedd 
>
>5/17 16:16:11 [31892] querying for = removed/held=20
>jobs 
>
>5/17 16:16:11 [31892] Using = constraint 
>
>((Owner=3D?=3D"Panagiotis"&&JobUniverse=3D=3D9))=20
>&& ((Managed =3D!=3D "ScheddDone")) && (JobStatus =3D=3D =
>3 || JobStatus=20
>=3D=3D 4 || (JobStatus =3D=3D 5 && Managed =3D?=3D 
>
>"External")) 
>
>5/17 16:16:11 [31892] Fetched 0 job ads = from=20
>schedd 
>
>5/17 16:16:11 [31892] leaving=20 doContactSchedd() 
>
>5/17 16:16:11 [31892] gahp server not up = yet,=20
>delaying ping 
>
>5/17 16:16:11 [31892] *** UpdateLeases=20 called 
>
>5/17 16:16:11 [31892] Leases not = supported,=20
>cancelling timer 
>
>5/17 16:16:11 [31892] ***=20 checkDelegation() 
>
>5/17 16:16:11 [31892] gahp server not up = yet,=20
>delaying checkDelegation 
>
>5/17 16:16:11 [31892] (99.0) = doEvaluateState called:=20
>gmState GM_INIT, globusState 32 
>
>5/17 16:16:11 [31892] GAHP server pid =3D =
>31893 
>
>5/17 16:16:11 [31892] Failed to read GAHP = server=20
>version 
>
>5/17 16:16:11 [31892] (99.0) Error = initializing=20
>GAHP 
>
>5/17 16:16:11 [31892] (99.0) gm state = change: GM_INIT=20
>-> GM_HOLD 
>
>5/17 16:16:11 [31892] (99.0) Writing hold = record to=20
>user logfile 
>
>5/17 16:16:11 [31892] (99.0) gm state = change: GM_HOLD=20
>-> GM_DELETE 
>
>5/17 16:16:11 [31892] DaemonCore: No more = children=20
>processes to reap. 
>
>5/17 16:16:11 [31892] ERROR "Gahp Server =
>(pid=3D31893)=20
>exited with status 255 " at line 278 in file gahp-client.C 
>
>we have checked the JAVA configuration as = two other=20
>similar posts suggested but as far as we can see everything is in order. = Any=20
>ideas? 
>
>  
>
>Thank you in advance, 
>
>  
>
>Giannis=20 Kampolis 
>
>  
>  ------=_NextPart_000_0004_01C798A0.E939B110--
>)
>A99 OK UID FETCH completed
>