[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Debugging a condor-G streaming output and error situation.




I have a user who is submitting a condor-G job using
a condor-6.7.19 client (grid/gt2) from a machine i do not control,
to a OSG Grid/gt2 jobmanager-condor on a machine I do control.

He would like to see the standard output and standard error
stream back to his client machine while the job is running.  This
is not a common thing for OSG users to do, but for this application
he has discussed the motivation with me and I have determined that
the quantity of streaming is limited and the application is worthwhile.

The relevant settings in his submit file:

should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT
transfer_input_files = /prj/ilc-accel/lebrun/CHEF/ChefSteering/runGridF2b.sh, /scratch/btev01/lebrun/ilc/CHEF/ChefSteering/myCEs.tar
transfer_output = true
transfer_error = true
transfer_executable = true
output = mlSt1job_$(Cluster).$(Process).lis
transfer_output_files = Steer_MLDyn1_AC_$(Cluster)-$(Process).tar
error = mlSt1job.err.$(Cluster).$(Process)
stream_output = true
stream_error = true


===================

Currently, he does not get any standard output or error back until the
end of the job.   I can submit a test job from the same machine
and I also do not get any standard output and standard error back until
the end of the job.

I have several questions, so far:

1) Under the latest globus/gt2/condor grid_monitor software stack,
is streaming supposed to work?  I haven't tried it in almost two years
and am not sure if anything has changed in the globus or condor stack
that will just block it outright.

2) Is the when_to_transfer_output setting above affecting the
transfer of the streamed stdout too?

3) I am trying to reproduce this on another newer client machine so that
I have condor 6.8.4 client and grid monitor--but there it
is rejecting the job because I have in the condor configuration:

SUBMIT_EXPRS = JobLeaseDuration
JobLeaseDuration = 3600

Is there any way for a user to override this for a single job or
set of jobs?  condor_submit does not let you request stream_output=true
or stream_error=true when JobLeaseDuration is defined.

Thanks

Steve Timm


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.