[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] [Condor-Users] Globus Error: staging error for RSL element fileStageIn



Hi,
I am having a problem with fileStageIn with Condor-G.
We are setting up a grid using Condor-G as its scheduler.
On one machine we have GT4 setup, which is named gridone.bits-goa.ac.in ,m its ip address is 10.1.3.153
On another machine we have Condor setup from rpm . It is named condor.bits-goa.ac.in , its ip is 10.1.3.152.
On this machine itself, we also have GRAM installed. Now, GRAM submits a job (using the "globusrun-ws -submit"...)
and returns the output without any error and gives no error. Even file transfer takes place without any error (Using
 gridftp ). On the other hand , when Condor-G is used to submit , the log shows the following error:
Job was held.
Globus error: Staging error for RSL element fileStageIn.
My job is test.sub :

universe = globus
grid_type = gt4
jobmanager_type = Fork
executable = /bin/date
transfer_executable = false
globusscheduler = https://gridone.bits-goa.ac.in:8443/wsrf/services/ManagedJobFactoryService
output = condor.out
error = condor.error
log = condor.log
queue



The Error Log is:

000 (083.000.000) 02/01 23:13:43 Job submitted from host: < 10.1.3.152:36249>
...
017 (083.000.000) 02/01 23:14:03 Job submitted to Globus
RM-Contact: https://gridone.bits-goa.ac.in:8443/wsrf/services/ManagedJobFactoryService
JM-Contact: https://10.1.3.153:8443/wsrf/services/ManagedExecutableJobService?c7f4c550-b21b-11db-aa94-818064dff7c3
Can-Restart-JM: 0
...
027 (083.000.000) 02/01 23:14:03 Job submitted to grid resource
GridResource: gt4 https://gridone.bits-goa.ac.in:8443/wsrf/services/ManagedJobFactoryService Fork
GridJobId: gt4 https://10.1.3.153:8443/wsrf/services/ManagedExecutableJobService?c7f4c550-b21b-11db-aa94-818064dff7c3
...
012 (083.000.000) 02/01 23:14:08 Job was held.
Globus error: Staging error for RSL element fileStageIn.
Code 0 Subcode 0
...
000 (084.000.000) 02/01 23:32:51 Job submitted from host: < 10.1.3.152:36766>
...
017 (084.000.000) 02/01 23:33:16 Job submitted to Globus
RM-Contact: https://gridone.bits-goa.ac.in:8443/wsrf/services/ManagedJobFactoryService
JM-Contact: https://10.1.3.153:8443/wsrf/services/ManagedExecutableJobService?75fc3eb0-b21e-11db-8142-8da1e7becbbd
Can-Restart-JM: 0
...
027 (084.000.000) 02/01 23:33:16 Job submitted to grid resource
GridResource: gt4 https://gridone.bits-goa.ac.in:8443/wsrf/services/ManagedJobFactoryService Fork
GridJobId: gt4 https://10.1.3.153:8443/wsrf/services/ManagedExecutableJobService?75fc3eb0-b21e-11db-8142-8da1e7becbbd
...
012 (084.000.000) 02/01 23:33:20 Job was held.
Globus error: Staging error for RSL element fileStageIn.
Code 0 Subcode 0
...
009 (073.000.000) 02/02 00:22:25 Job was aborted by the user.
via condor_rm (by user bitsp)
...
009 (084.000.000) 02/02 00:22:25 Job was aborted by the user.
via condor_rm (by user bitsp)
...
009 (075.000.000) 02/02 00:22:25 Job was aborted by the user.
via condor_rm (by user bitsp)
...
009 (078.000.000) 02/02 00:22:25 Job was aborted by the user.
via condor_rm (by user bitsp)
...
009 (079.000.000) 02/02 00:22:25 Job was aborted by the user.
via condor_rm (by user bitsp)
...
009 (081.000.000 ) 02/02 00:22:25 Job was aborted by the user.
via condor_rm (by user bitsp)
...
009 (083.000.000) 02/02 00:22:25 Job was aborted by the user.
via condor_rm (by user bitsp)
...
000 (086.000.000) 02/02 01:29:32 Job submitted from host: < 10.1.3.152:36766>
...
017 (086.000.000) 02/02 01:29:59 Job submitted to Globus
RM-Contact: https://gridone.bits-goa.ac.in:8443/wsrf/services/ManagedJobFactoryService
JM-Contact: https://10.1.3.153:8443/wsrf/services/ManagedExecutableJobService?c40037f0-b22e-11db-87f3-9bff1edf1f9c
Can-Restart-JM: 0
...
027 (086.000.000) 02/02 01:29:59 Job submitted to grid resource
GridResource: gt4 https://gridone.bits-goa.ac.in:8443/wsrf/services/ManagedJobFactoryService Fork
GridJobId: gt4 https://10.1.3.153:8443/wsrf/services/ManagedExecutableJobService?c40037f0-b22e-11db-87f3-9bff1edf1f9c
...
012 (086.000.000) 02/02 01:30:02 Job was held.
Globus error: Staging error for RSL element fileStageIn.
Code 0 Subcode 0
...

_____________________________________________________________________________

The following is the Gridmanager log's stack trace


2/2 01:30:02 [26371] GAHP[26373] (stderr) ->
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> stackTrace:
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> org.globus.exec.generated.StagingFaultType: Staging error for RSL element fileStageIn.
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> Timestamp: Fri Feb 02 01:30:18 IST 2007
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> Originator: Address: https://10.1.3.153:8443/wsrf/services/ManagedJobFactoryService
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> Reference property[0]:
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> <ns1:ResourceID xmlns:ns1=" http://www.globus.org/namespaces/2004/10/gram/job">c40037f0-b22e-11db-87f3-9bff1edf1f9c</ns1:ResourceID>
2/2 01:30:02 [26371] GAHP[26373] (stderr) ->
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at sun.reflect.NativeConstructorAccessorImpl.newInstance (NativeConstructorAccessorImpl.java:39)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at java.lang.Class.newInstance0(Class.java:308)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at java.lang.Class.newInstance (Class.java:261)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:485)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.exec.utils.FaultUtils.createStagingFault (FaultUtils.java:363)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.exec.service.exec.StateMachine.processStageInResponseState(StateMachine.java:990)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at java.lang.reflect.Method.invoke(Method.java:324)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.exec.service.exec.StateMachine.processState(StateMachine.java:362)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.exec.service.exec.RunThread.run(RunThread.java:94)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> Error authenticating user at source/dest hostAuthentication failed [Caused by: Operation unauthorized (Mechanism level: Authorization failed. Expected &quot;/CN=host/localhost.localdomain&quot; target but received &quot;/O=Grid/OU=GlobusTest/OU= gridone.bits-goa.ac.in/CN=host/gridone.bits-goa.ac.in&quot;)]. Caused by Authentication failed. Caused by GSSException: Operation unauthorized (Mechanism level: Authorization failed. Expected &quot;/CN=host/localhost.localdomain&quot; target but received &quot;/O=Grid/OU=GlobusTest/OU= gridone.bits-goa.ac.in/CN=host/gridone.bits-goa.ac.in&quot;)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.gsi.gssapi.GlobusGSSContextImpl.initSecContext (GlobusGSSContextImpl.java:509)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.ftp.extended.GridFTPControlChannel.authenticate(GridFTPControlChannel.java:203)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.ftp.GridFTPClient.authenticate(GridFTPClient.java:99)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.ftp.GridFTPClient.authenticate(GridFTPClient.java:84)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.transfer.reliable.service.TransferClient.authenticateDestination(TransferClient.java:557)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.transfer.reliable.service.TransferClient.authenticate( TransferClient.java:530)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.transfer.reliable.service.TransferWork.getNewClient(TransferWork.java:436)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.transfer.reliable.service.TransferWork.getTransferClient(TransferWork.java:373)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.transfer.reliable.service.TransferWork.run(TransferWork.java:684)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.wsrf.impl.work.WorkManagerImpl$WorkWrapper.run(WorkManagerImpl.java:355)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at java.lang.Thread.run(Thread.java :534)
2/2 01:30:02 [26371] GAHP[26373] (stderr) ->
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java :27)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at java.lang.reflect.Constructor.newInstance(Constructor.java:274)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at java.lang.Class.newInstance0(Class.java:308)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at java.lang.Class.newInstance(Class.java:261)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.axis.encoding.ser.BeanDeserializer.&lt;init&gt;( BeanDeserializer.java:90)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.axis.encoding.ser.BeanDeserializer.&lt;init&gt;(BeanDeserializer.java:76)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.exec.generated.StagingFaultType.getDeserializer(StagingFaultType.java:152)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at java.lang.reflect.Method.invoke(Method.java:324)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.axis.encoding.DeserializationContext.getDeserializerForClass (DeserializationContext.java:510)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.axis.encoding.ser.BeanDeserializer.onStartChild(BeanDeserializer.java:250)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.axis.encoding.DeserializationContext.startElement(DeserializationContext.java:1035)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch (Unknown Source)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.xerces.parsers.XML11Configuration.parse (Unknown Source)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.xerces.parsers.XMLParser.parse (Unknown Source)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at javax.xml.parsers.SAXParser.parse (SAXParser.java:345)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.axis.encoding.DeserializationContext.parse(DeserializationContext.java:227)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.wsrf.encoding.ObjectDeserializer.toObject (ObjectDeserializer.java:59)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at condor.gahp.gt4.JobListener.deliver(JobListener.java:157)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.wsrf.impl.notification.NotificationConsumerProvider.notify (NotificationConsumerProvider.java:109)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:39)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at java.lang.reflect.Method.invoke(Method.java:324)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.axis.providers.java.RPCProvider.invokeMethod(RPCProvider.java:384)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.axis.providers.java.RPCProvider.processMessage(RPCProvider.java:281)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:319)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.axis.handlers.soap.SOAPService.invoke(SOAPService.java:450)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.apache.axis.server.AxisServer.invoke(AxisServer.java:285)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.wsrf.container.ServiceThread.doPost(ServiceThread.java:676)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.wsrf.container.ServiceThread.process(ServiceThread.java:397)
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:302)
2/2 01:30:02 [26371] GAHP[26373] (stderr) ->
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> stateWhenFailureOccurred: StageIn
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> timestamp: java.util.GregorianCalendar[time=1170360018575,areFieldsSet=true,areAllFieldsSet=true,lenient=true,zone= sun.util.calendar.ZoneInfo[id="GMT",offset=0,dstSavings=0,useDaylight=false,transitions=0,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2007,MONTH=1,WEEK_OF_YEAR=5,WEEK_OF_MONTH=1,DAY_OF_MONTH=1,DAY_OF_YEAR=32,DAY_OF_WEEK=5,DAY_OF_WEEK_IN_MONTH=1,AM_PM=1,HOUR=8,HOUR_OF_DAY=20,MINUTE=0,SECOND=18,MILLISECOND=575,ZONE_OFFSET=0,DST_OFFSET=0]
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> Message:
2/2 01:30:02 [26371] GAHP[26373] (stderr) -> org.globus.exec.generated.StagingFaultType: Staging error for RSL element fileStageIn.
2/2 01:30:02 [26371] GAHP[26373] <- 'RESULTS'
2/2 01:30:02 [26371] GAHP[26373] -> 'R'
2/2 01:30:02 [26371] GAHP[26373] -> 'S' '1'
2/2 01:30:02 [26371] GAHP[26373] -> '2' ' https://10.1.3.153:8443/wsrf/services/ManagedExecutableJobService?c40037f0-b22e-11db-87f3-9bff1edf1f9c' 'Failed' 'Staging error for RSL element fileStageIn.' '0'
2/2 01:30:02 [26371] (86.0 ) gram callback: state Failed, fault Staging error for RSL element fileStageIn., exit code 0
2/2 01:30:02 [26371] (86.0) doEvaluateState called: gmState GM_SUBMITTED, globusState 64
2/2 01:30:02 [26371] (86.0) globus state change: StageIn -> Failed
2/2 01:30:02 [26371] (86.0) gm state change: GM_SUBMITTED -> GM_FAILED
2/2 01:30:02 [26371] GAHP[26373] <- 'GT4_GRAM_JOB_DESTROY 9 https://10.1.3.153:8443/wsrf/services/ManagedExecutableJobService?c40037f0-b22e-11db-87f3-9bff1edf1f9c'
2/2 01:30:02 [26371] GAHP[26373] -> 'S'
2/2 01:30:02 [26371] (86.0) doEvaluateState called: gmState GM_FAILED, globusState 4
2/2 01:30:02 [26371] GAHP[26373] <- 'RESULTS'
2/2 01:30:02 [26371] GAHP[26373] -> 'R'
2/2 01:30:02 [26371] GAHP[26373] -> 'S' '1'
2/2 01:30:02 [26371] GAHP[26373] -> '9' '0' 'NULL'
2/2 01:30:02 [26371] (86.0) doEvaluateState called: gmState GM_FAILED, globusState 4
2/2 01:30:02 [26371] (86.0) gm state change: GM_FAILED -> GM_HOLD
2/2 01:30:02 [26371] (86.0) Writing hold record to user logfile
2/2 01:30:02 [26371] (86.0) gm state change: GM_HOLD -> GM_DELETE
2/2 01:30:07 [26371] in doContactSchedd()
2/2 01:30:07 [26371] querying for removed/held jobs
2/2 01:30:07 [26371] Using constraint ((Owner=?="bitsp"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) && (JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?= "External"))
2/2 01:30:07 [26371] Fetched 0 job ads from schedd
2/2 01:30:07 [26371] Updating classad values for 86.0:
2/2 01:30:07 [26371] GlobusDelegationUri = UNDEFINED
2/2 01:30:07 [26371] GridftpUrlBase = UNDEFINED
2/2 01:30:07 [26371] GlobusSubmitId = UNDEFINED
2/2 01:30:07 [26371] GridJobId = UNDEFINED
2/2 01:30:07 [26371] GlobusStatus = 32
2/2 01:30:07 [26371] JobStatus = 5
2/2 01:30:07 [26371] EnteredCurrentStatus = 1170360002
2/2 01:30:07 [26371] HoldReason = "Globus error: Staging error for RSL element fileStageIn."
2/2 01:30:07 [26371] HoldReasonCode = 0
2/2 01:30:07 [26371] HoldReasonSubCode = 0
2/2 01:30:07 [26371] ReleaseReason = UNDEFINED
2/2 01:30:07 [26371] NumSystemHolds = 1
2/2 01:30:07 [26371] Managed = "Schedd"
2/2 01:30:07 [26371] No jobs left, shutting down
2/2 01:30:07 [26371] leaving doContactSchedd()
2/2 01:30:07 [26371] Got SIGTERM. Performing graceful shutdown.
2/2 01:30:07 [26371] Started timer to call main_shutdown_fast in 1800 seconds
2/2 01:30:07 [26371] **** condor_gridmanager (condor_GRIDMANAGER) EXITING WITH


______________________________________________________________________________

Also please clear my doubts on Condor G:
1) Is it nescessary to have GT4 or any other of its components installed on the Condor Machine?
2) Does Condor-G works well inside Condor or anything else is also required for it to work.
___________________


Gaurav Paruthi
EEE Student,
BITS Pilani,Goa Campus
India.