[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Debugging a condor-G streaming output and error situation.
- Date: Thu, 12 Apr 2007 10:25:28 -0500
- From: Dan Bradley <dan@xxxxxxxxxxxx>
- Subject: Re: [Condor-users] Debugging a condor-G streaming output and error situation.
I see the following in the release notes for 6.7.17:
"Made several changes to make Condor-G much less likely to overload a
pre-WS GRAM server for grid-type gt2 jobs. Added configuration parameter
GRIDMANAGER_MAX_JOBMANAGERS_PER_RESOURCE, which limits the number of
globus-job-manager processes Condor will let run on the server at a
time. Streaming of output for gt2 jobs is disabled if
GRIDMANAGER_MAX_JOBMANAGERS_PER_RESOURCE isn't set to unlimited. If the
Grid Monitor encounters problems, the condor_ gridmanager doesn't
restart the globus-job-managers of the affected jobs. Fixed a couple
bugs in the Grid Monitor that could cause it to spawn extra polling
processes on the server."
I also note that the incompabitiliby between streaming input/output and
job leases is being fixed and will likely be released as a feature of 6.9.3.
Steven Timm wrote:
I have a user who is submitting a condor-G job using
a condor-6.7.19 client (grid/gt2) from a machine i do not control,
to a OSG Grid/gt2 jobmanager-condor on a machine I do control.
He would like to see the standard output and standard error
stream back to his client machine while the job is running. This
is not a common thing for OSG users to do, but for this application
he has discussed the motivation with me and I have determined that
the quantity of streaming is limited and the application is worthwhile.
The relevant settings in his submit file:
should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT
transfer_output = true
transfer_error = true
transfer_executable = true
output = mlSt1job_$(Cluster).$(Process).lis
transfer_output_files = Steer_MLDyn1_AC_$(Cluster)-$(Process).tar
error = mlSt1job.err.$(Cluster).$(Process)
stream_output = true
stream_error = true
Currently, he does not get any standard output or error back until the
end of the job. I can submit a test job from the same machine
and I also do not get any standard output and standard error back until
the end of the job.
I have several questions, so far:
1) Under the latest globus/gt2/condor grid_monitor software stack,
is streaming supposed to work? I haven't tried it in almost two years
and am not sure if anything has changed in the globus or condor stack
that will just block it outright.
2) Is the when_to_transfer_output setting above affecting the
transfer of the streamed stdout too?
3) I am trying to reproduce this on another newer client machine so that
I have condor 6.8.4 client and grid monitor--but there it
is rejecting the job because I have in the condor configuration:
SUBMIT_EXPRS = JobLeaseDuration
JobLeaseDuration = 3600
Is there any way for a user to override this for a single job or
set of jobs? condor_submit does not let you request stream_output=true
or stream_error=true when JobLeaseDuration is defined.