[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor-G gt4 delegation issues



Hi,

I'm using Condor 6.8.2 with Globus Toolkit 4.0.3. I'm stress testing Condor-G matchmaking with long running jobs (see stress.jdl, job runs ~ 9000s) using gt4 grid_type and I'm experiencing a lot of problems with credential refreshing on Delegation Service.



1. First problem is credential refresh. I started only one instance of above benchmark with short proxy (1 hour) in order to see how will refresh work.

In the first run, after I refreshed the proxy certificate on submit machine, new proxy was not delegated to Delegation Service. After the initial proxy expired, job failed in clean up stage. Globus container log is in attachment container.log.proxyexpired.

In the next run, I restarted condor manually after refreshing proxy
and gridmanager delegated new proxy properly.



2. When I ran all jobs, jobs started failing after 12 hours. However, after refresh failure, submission of remaining jobs also failed with error:
"Globus error: java.rmi.RemoteException: Job creation failed.; nested exception is: org.globus.wsrf.ResourceException: ; nested exception is: org.globus.delegation.DelegationException: Error getting delegation resource [Caused by: org.globus.wsrf.NoSuchResourceException]"
Problem was obviously that gridmanager kept using expired Delegation Service for new jobs.

When I manually set GlobusDelegationUri to UNDEFINED (with condor_qedit) on all jobs, gridmanager created a new Delegation Service and remaining jobs were executed fine afterwards (until next refresh failure). Gridmanager log is in attachment GridmanagerLog.eimamagi.failure. Corresponding part of globus container log is in attach container.log.submit_fail. Interesting thing I noticed is that gridmanager kept trying to refresh credential on expired Delegation Service.

I have a suggestion for the second issue - when a credential refresh fails, create a new DelegationService. Or at least make this optionable. I would rather have multiple Delegation Services than job submission failures. Unsuccessfully refreshed Delegation Services don't have valid proxies so they don't pose any security issues. Also, even if the credential refresh works perfectly, this would cover the case when the Delegation Service was unavailable for a while and gridmanager wasn't able to refresh it.

Also, gt4 gahp could stop trying to refresh failed Delegation Service after some period (especially if Globus returns org.globus.wsrf.NoSuchResourceException).



Cheers,
emir
2006-11-29 03:48:22,796 ERROR exec.StateMachine [RunQueue FileCleanUp-2,fileCleanUp:2777] A secondary fault occured while trying to gracefully fail.
AxisFault
 faultCode: {http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd}General
 faultSubcode: 
 faultString: Certificate C=HR,O=edu,OU=test,CN=Emir Imamagic,CN=1621970621 expired.
 faultActor: 
 faultNode: 
 faultDetail: 
	{http://xml.apache.org/axis/}stackTrace:org.globus.gsi.proxy.ProxyPathValidatorException: Certificate C=HR,O=edu,OU=test,CN=Emir Imamagic,CN=1621970621 expired.
	at org.globus.gsi.proxy.ProxyPathValidator.checkValidity(ProxyPathValidator.java:742)
	at org.globus.gsi.proxy.ProxyPathValidator.validate(ProxyPathValidator.java:489)
	at org.globus.gsi.proxy.ProxyPathValidator.validate(ProxyPathValidator.java:273)
	at org.globus.wsrf.impl.security.authentication.wssec.WSSecurityEngine.verifyXMLSignature(WSSecurityEngine.java:299)
	at org.globus.wsrf.impl.security.authentication.wssec.WSSecurityRequestEngine.verifyXMLSignature(WSSecurityRequestEngine.java:97)
	at org.globus.wsrf.impl.security.authentication.wssec.WSSecurityEngine.handleSignatureElement(WSSecurityEngine.java:116)
	at org.globus.wsrf.impl.security.authentication.wssec.WSSecurityEngine.processSecurityHeader(WSSecurityEngine.java:516)
	at org.globus.wsrf.impl.security.authentication.wssec.WSSecurityEngine.processSecurityHeader(WSSecurityEngine.java:482)
	at org.globus.wsrf.impl.security.authentication.wssec.WSSecurityEngine.processSecurityHeader(WSSecurityEngine.java:397)
	at org.globus.wsrf.impl.security.authentication.wssec.WSSecurityRequestEngine.processSecurityHeader(WSSecurityRequestEngine.java:61)
	at org.globus.wsrf.impl.security.authentication.wssec.WSSecurityBasicHandler.handleMessage(WSSecurityBasicHandler.java:43)
	at org.globus.wsrf.impl.security.authentication.wssec.WSSecurityHandler.handleRequest(WSSecurityHandler.java:21)
	at org.apache.axis.handlers.HandlerChainImpl.handleRequest(HandlerChainImpl.java:105)
	at org.apache.axis.handlers.JAXRPCHandler.invoke(JAXRPCHandler.java:52)
	at org.globus.wsrf.handlers.JAXRPCHandler.invoke(JAXRPCHandler.java:26)
	at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
	at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
	at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
	at org.apache.axis.server.AxisServer.invoke(AxisServer.java:248)
	at org.apache.axis.transport.local.LocalSender.invoke(LocalSender.java:141)
	at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
	at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
	at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
	at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165)
	at org.apache.axis.client.Call.invokeEngine(Call.java:2727)
	at org.apache.axis.client.Call.invoke(Call.java:2710)
	at org.apache.axis.client.Call.invoke(Call.java:2386)
	at org.apache.axis.client.Call.invoke(Call.java:2309)
	at org.apache.axis.client.Call.invoke(Call.java:1766)
	at org.globus.rft.generated.bindings.ReliableFileTransferFactoryPortTypeSOAPBindingStub.createReliableFileTransfer(ReliableFileTransferFactoryPortTypeSOAPBindingStub.java:874)
	at org.globus.exec.service.exec.utils.StagingHelper.submitStagingRequest(StagingHelper.java:168)
	at org.globus.exec.service.exec.StateMachine.fileCleanUp(StateMachine.java:2763)
	at org.globus.exec.service.exec.StateMachine.processFailureFileCleanUpState(StateMachine.java:2145)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:324)
	at org.globus.exec.service.exec.StateMachine.processState(StateMachine.java:362)
	at org.globus.exec.service.exec.RunThread.run(RunThread.java:94)

	{http://xml.apache.org/axis/}hostname:site.grid.hr

Certificate C=HR,O=edu,OU=test,CN=Emir Imamagic,CN=1621970621 expired.
	at org.apache.axis.message.SOAPFaultBuilder.createFault(SOAPFaultBuilder.java:221)
	at org.apache.axis.message.SOAPFaultBuilder.endElement(SOAPFaultBuilder.java:128)
	at org.apache.axis.encoding.DeserializationContext.endElement(DeserializationContext.java:1087)
	at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
	at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
	at javax.xml.parsers.SAXParser.parse(SAXParser.java:345)
	at org.apache.axis.encoding.DeserializationContext.parse(DeserializationContext.java:227)
	at org.apache.axis.SOAPPart.getAsSOAPEnvelope(SOAPPart.java:645)
	at org.apache.axis.Message.getSOAPEnvelope(Message.java:424)
	at org.apache.axis.message.addressing.handler.AddressingHandler.processClientResponse(AddressingHandler.java:305)
	at org.apache.axis.message.addressing.handler.AddressingHandler.invoke(AddressingHandler.java:110)
	at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
	at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
	at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
	at org.apache.axis.client.AxisClient.invoke(AxisClient.java:190)
	at org.apache.axis.client.Call.invokeEngine(Call.java:2727)
	at org.apache.axis.client.Call.invoke(Call.java:2710)
	at org.apache.axis.client.Call.invoke(Call.java:2386)
	at org.apache.axis.client.Call.invoke(Call.java:2309)
	at org.apache.axis.client.Call.invoke(Call.java:1766)
	at org.globus.rft.generated.bindings.ReliableFileTransferFactoryPortTypeSOAPBindingStub.createReliableFileTransfer(ReliableFileTransferFactoryPortTypeSOAPBindingStub.java:874)
	at org.globus.exec.service.exec.utils.StagingHelper.submitStagingRequest(StagingHelper.java:168)
	at org.globus.exec.service.exec.StateMachine.fileCleanUp(StateMachine.java:2763)
	at org.globus.exec.service.exec.StateMachine.processFailureFileCleanUpState(StateMachine.java:2145)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:324)
	at org.globus.exec.service.exec.StateMachine.processState(StateMachine.java:362)
	at org.globus.exec.service.exec.RunThread.run(RunThread.java:94)

	
2006-11-29 11:34:42,016 ERROR delegation.DelegationUtil [ServiceThread-1303,getDelegationResource:253] Error getting delegation resource
org.globus.wsrf.NoSuchResourceException
	at org.globus.delegation.service.DelegationResource.load(DelegationResource.java:405)
	at org.globus.delegation.service.DelegationHome.find(DelegationHome.java:53)
	at org.globus.delegation.DelegationUtil.getDelegationResource(DelegationUtil.java:251)
	at org.globus.delegation.DelegationUtil.registerDelegationListener(DelegationUtil.java:166)
	at org.globus.exec.service.utils.DelegatedCredential.getDelegatedCredential(DelegatedCredential.java:179)
	at org.globus.exec.service.job.ManagedJobResourceImpl.getJobCredential(ManagedJobResourceImpl.java:421)
	at org.globus.exec.service.exec.ManagedExecutableJobResource.initSecurity(ManagedExecutableJobResource.java:344)
	at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize(ManagedExecutableJobResource.java:190)
	at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState(ManagedExecutableJobResource.java:161)
	at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeWholeState(PersistentManagedExecutableJobResource.java:154)
	at org.globus.exec.service.exec.ManagedExecutableJobHome.create(ManagedExecutableJobHome.java:300)
	at org.globus.exec.service.factory.ManagedJobFactoryService.createManagedJob(ManagedJobFactoryService.java:310)
	at sun.reflect.GeneratedMethodAccessor1146.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:324)
	at org.apache.axis.providers.java.RPCProvider.invokeMethod(RPCProvider.java:384)
	at org.globus.axis.providers.RPCProvider.invokeMethodSub(RPCProvider.java:107)
	at org.globus.axis.providers.PrivilegedInvokeMethodAction.run(PrivilegedInvokeMethodAction.java:42)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:379)
	at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:55)
	at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:90)
	at org.globus.axis.providers.RPCProvider.invokeMethod(RPCProvider.java:97)
	at org.apache.axis.providers.java.RPCProvider.processMessage(RPCProvider.java:281)
	at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:319)
	at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
	at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
	at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
	at org.apache.axis.handlers.soap.SOAPService.invoke(SOAPService.java:450)
	at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285)
	at org.globus.wsrf.container.ServiceThread.doPost(ServiceThread.java:676)
	at org.globus.wsrf.container.ServiceThread.process(ServiceThread.java:397)
	at org.globus.wsrf.container.GSIServiceThread.process(GSIServiceThread.java:151)
	at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:302)
2006-11-29 11:34:42,021 ERROR factory.ManagedJobFactoryService [ServiceThread-1303,createManagedJob:365] Job creation failed.
org.globus.wsrf.ResourceException: ; nested exception is: 
	org.globus.delegation.DelegationException: Error getting delegation resource [Caused by: org.globus.wsrf.NoSuchResourceException]
	at org.globus.exec.service.exec.ManagedExecutableJobResource.initSecurity(ManagedExecutableJobResource.java:352)
	at org.globus.exec.service.exec.ManagedExecutableJobResource.initialize(ManagedExecutableJobResource.java:190)
	at org.globus.exec.service.exec.ManagedExecutableJobResource.initializeWholeState(ManagedExecutableJobResource.java:161)
	at org.globus.exec.service.exec.PersistentManagedExecutableJobResource.initializeWholeState(PersistentManagedExecutableJobResource.java:154)
	at org.globus.exec.service.exec.ManagedExecutableJobHome.create(ManagedExecutableJobHome.java:300)
	at org.globus.exec.service.factory.ManagedJobFactoryService.createManagedJob(ManagedJobFactoryService.java:310)
	at sun.reflect.GeneratedMethodAccessor1146.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:324)
	at org.apache.axis.providers.java.RPCProvider.invokeMethod(RPCProvider.java:384)
	at org.globus.axis.providers.RPCProvider.invokeMethodSub(RPCProvider.java:107)
	at org.globus.axis.providers.PrivilegedInvokeMethodAction.run(PrivilegedInvokeMethodAction.java:42)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:379)
	at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:55)
	at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:90)
	at org.globus.axis.providers.RPCProvider.invokeMethod(RPCProvider.java:97)
	at org.apache.axis.providers.java.RPCProvider.processMessage(RPCProvider.java:281)
	at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:319)
	at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
	at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
	at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
	at org.apache.axis.handlers.soap.SOAPService.invoke(SOAPService.java:450)
	at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285)
	at org.globus.wsrf.container.ServiceThread.doPost(ServiceThread.java:676)
	at org.globus.wsrf.container.ServiceThread.process(ServiceThread.java:397)
	at org.globus.wsrf.container.GSIServiceThread.process(GSIServiceThread.java:151)
	at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:302)
Caused by: org.globus.delegation.DelegationException: Error getting delegation resource [Caused by: org.globus.wsrf.NoSuchResourceException]
	at org.globus.delegation.DelegationUtil.getDelegationResource(DelegationUtil.java:254)
	at org.globus.delegation.DelegationUtil.registerDelegationListener(DelegationUtil.java:166)
	at org.globus.exec.service.utils.DelegatedCredential.getDelegatedCredential(DelegatedCredential.java:179)
	at org.globus.exec.service.job.ManagedJobResourceImpl.getJobCredential(ManagedJobResourceImpl.java:421)
	at org.globus.exec.service.exec.ManagedExecutableJobResource.initSecurity(ManagedExecutableJobResource.java:344)
	... 27 more

Attachment: GridmanagerLog.eimamagi.failure
Description: Binary data

Executable=cg.C
stream_output=false
stream_error=false

MyProxyHost     = myproxy.grid.hr:7512
MyProxyCredentialName = condor
MyProxyPassword = password

universe=grid
Log=condorG.log
Output=output/cg.C.out$(Cluster).$(Process)
Error=output/cg.C.err 

grid_resource = $$(gridtype_and_args)
requirements    = (TARGET.gridtype_and_args =!= UNDEFINED)
x509userproxy = /tmp/x509up_u500 

PeriodicHold = (GlobusStatus == 1) && (JobStatus == 1) && \
		((CurrentTime - EnteredCurrentStatus) > 600) && \
		(LastMatchTime =!= UNDEFINED && ((CurrentTime - LastMatchTime) > 600))

PeriodicRelease = (GlobusStatus == 32 || GlobusStatus == 0 || GlobusStatus == 1) && \
                        ((CurrentTime - EnteredCurrentStatus) > 300) && \
                        ( HoldReason != "via condor_hold (by user $ENV(USER))" )
GlobusResubmit = (GlobusStatus == 32 || GlobusStatus == 0 || GlobusStatus == 1) && \
                        (JobStatus == 1) && (NumSystemHolds >= NumJobMatches)
Rematch = TRUE

queue 200