[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] HoldReason = "Streaming not supported"



On Jun 27, 2008, at 12:36 PM, Sean Manning wrote:

Jaime Frey wrote:

On Jun 23, 2008, at 6:50 PM, Sean Manning wrote:

I am working on a Web Services interface to submit jobs to our Globus
grid.  It uses the condor and birdbath Java packages.  We can
successfully submit the attached JDL on the command line of a condor
head node (the metascheduler of our grid)  and see it complete, but
when we submit it with the Java program from an external Condor client
machine the job stays Idle then Halts with an error.  Running the
condor daemons as root got rid of one error, but now we get another
one: HoldReason = "Streaming not supported".  I can't find any
information about this error in the usergroup archives.  Does anyone
here have an idea what could be causing this?

For GT4 GRAM jobs, if StreamOut and StreamErr aren't explicitly set to False in the job ad, then Condor assumes you want stdout and stderr to
be streamed, which isn't supported by Condor for GT4 GRAM jobs. This
appears to be a bug, as the default behavior for other job types is no
streaming.

If you add the following two attributes to your job ads, it should
eliminate the problem:
StreamOut = False
StreamErr = False

Thanks and regards,
Jaime Frey
UW-Madison Condor Team


Dear Jaime,

 Thanks for the reply.

 I made that change, but jobs are still hanging with HoldReason =
"Streaming not supported."  I can submit the new file with
condor_submit from the grid metascheduler and see it appear on the head
node of a worker cluster, when condor_config has SOAP enabled.  The
output and error come back to the machine I submitted the job from just like they are supposed to. But when I submit the same JDL to the grid metascheduler using our Web Services code, the job always holds after a
delay.

 Right now, the Condor daemons are running as root.  The web services
code is running on my personal account (seangwm) on my workstation.
The spool directory on the metascheduler
($CONDOR_LOCATION/local.babargt4/spool) belongs to condor:root..  We
have been changing the owner of the job folder on the spool
($CONDOR_LOCATION/local.babargt4/spool/cluster5252.proc0.subproc0) by
hand from root:root to my personal account and group, because jobs stay idle until I do so. I think that this has to do with the fact that the
proxy file must have very specific permissions so the grid will trust
it.  If I change the owner of the spool folder to root:root I get a
HoldReason = "Failed to get expiration time of proxy" instead.

 In principle, if we can submit a job to the grid using condor_submit,
then the web services submission should work as well.  I would be very
grateful if you have any further advice about what I am missing.

 I have attached our main Java class for job submittion and the JDL
which I have been trying with the Web Services code.  In the attached
files, babargt4 is the grid metascheduler and ugdev07 is the head of
one of the clusters of worker nodes.


Can you look at the values of StreamOut and StreamErr in the classad of the held job in the schedd? I'm guessing they're either missing or set to the string "False". They need to set to False (no quotes). I'll bet your JobHelper class isn't handling these attributes correctly.

Thanks and regards,
Jaime Frey
UW-Madison Condor Team