[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Debugging a condor-G streaming output and error situation.

I see the following in the release notes for 6.7.17:

"Made several changes to make Condor-G much less likely to overload a pre-WS GRAM server for grid-type gt2 jobs. Added configuration parameter GRIDMANAGER_MAX_JOBMANAGERS_PER_RESOURCE, which limits the number of globus-job-manager processes Condor will let run on the server at a time. Streaming of output for gt2 jobs is disabled if GRIDMANAGER_MAX_JOBMANAGERS_PER_RESOURCE isn't set to unlimited. If the Grid Monitor encounters problems, the condor_ gridmanager doesn't restart the globus-job-managers of the affected jobs. Fixed a couple bugs in the Grid Monitor that could cause it to spawn extra polling processes on the server."

I also note that the incompabitiliby between streaming input/output and job leases is being fixed and will likely be released as a feature of 6.9.3.


Steven Timm wrote:

I have a user who is submitting a condor-G job using
a condor-6.7.19 client (grid/gt2) from a machine i do not control,
to a OSG Grid/gt2 jobmanager-condor on a machine I do control.

He would like to see the standard output and standard error
stream back to his client machine while the job is running.  This
is not a common thing for OSG users to do, but for this application
he has discussed the motivation with me and I have determined that
the quantity of streaming is limited and the application is worthwhile.

The relevant settings in his submit file:

should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT
transfer_input_files = /prj/ilc-accel/lebrun/CHEF/ChefSteering/runGridF2b.sh, /scratch/btev01/lebrun/ilc/CHEF/ChefSteering/myCEs.tar
transfer_output = true
transfer_error = true
transfer_executable = true
output = mlSt1job_$(Cluster).$(Process).lis
transfer_output_files = Steer_MLDyn1_AC_$(Cluster)-$(Process).tar
error = mlSt1job.err.$(Cluster).$(Process)
stream_output = true
stream_error = true


Currently, he does not get any standard output or error back until the
end of the job.   I can submit a test job from the same machine
and I also do not get any standard output and standard error back until
the end of the job.

I have several questions, so far:

1) Under the latest globus/gt2/condor grid_monitor software stack,
is streaming supposed to work?  I haven't tried it in almost two years
and am not sure if anything has changed in the globus or condor stack
that will just block it outright.

2) Is the when_to_transfer_output setting above affecting the
transfer of the streamed stdout too?

3) I am trying to reproduce this on another newer client machine so that
I have condor 6.8.4 client and grid monitor--but there it
is rejecting the job because I have in the condor configuration:

SUBMIT_EXPRS = JobLeaseDuration
JobLeaseDuration = 3600

Is there any way for a user to override this for a single job or
set of jobs?  condor_submit does not let you request stream_output=true
or stream_error=true when JobLeaseDuration is defined.


Steve Timm