[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] out of order condor-g failures in log file



I've got a large DAG running with Condor-G on Open Science Grid.
All the nodes are using one log file, and I'm seeing a lot of errors like this:

000 (887642.000.000) 04/09 17:04:23 Job submitted from host:
<10.0.10.39:54286>
   DAG Node: 3cb5-00257

018 (887642.000.000) 04/09 17:04:55 Globus job submission failed!
   Reason: 22 the job manager failed to create an internal script
argument file

017 (887642.000.000) 04/09 17:05:14 Job submitted to Globus
   RM-Contact: gridgk01.racf.bnl.gov/jobmanager-condor
   JM-Contact: https://gridgk01.racf.bnl.gov:20908/26140/1270847110/
   Can-Restart-JM: 1
...
027 (887642.000.000) 04/09 17:05:14 Job submitted to grid resource
   GridResource: gt2 gridgk01.racf.bnl.gov/jobmanager-condor
   GridJobId: gt2 gridgk01.racf.bnl.gov/jobmanager-condor
https://gridgk01.racf.bnl.gov:20908/26140/1270847110/



Shouldn't code 018 for a Globus submit failure have to happen after the job is submitted to Globus?

Some of these jobs then manage to execute and complete, although I have to look further to see if they are running successfully.

Best,
Peter