[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Job Realtime output file
- Date: Mon, 11 Mar 2013 14:19:11 +0100 (CET)
- From: Francesco Prelz <Francesco.Prelz@xxxxxxxxxx>
- Subject: Re: [HTCondor-users] Job Realtime output file
On Sat, 9 Mar 2013, Guillermo Marco Puche wrote:
I know those directives are SGE directives. From my pov is SGE handles job he
must be able also to handle it's own error and output logs.
The trouble here is that SGE is being handed, for a number of hard
reasons, a russian doll of scripts to execute. Your job is the smallest
doll, while the -o and -e directives (and yes, you are overriding the
directives set by default by 'bosco') apply to the outermost doll. It's
very likely that stdout and stderr are already being diverted at inner
layers. If you'd really like to see streaming stdout from your job, your
best option (until we have some form of out-of-the-box Condor 'standard
universe' for 'grid' or 'vanilla' universe jobs, which would indeed come
in handy for many other applications) is probably to set up some form of
remote I/O yourself.
If you have at least outbound network connectivity from the worker nodes
to the submit node you could try using 'chirp' (a standalone incarnation
of the Aitch-Tee-Condor Remote I/O protocol, which may eventually
be "re-"integrated into the 'grid' universe as the remote I/O method of
In its simplest form:
0) Grab and install 'cctools', and make it available on the submit
and worker nodes.
(the site seems to be down right now)
1) Start chirp_server on the submit node (will bind on port
9094 by default, use *no* authentication/authorisation and
write files in the current directory).
2) Run your payload on the worker nodes with
./payload |tee chirp_put -t -1 -b 4096 - submit_node.domain my_job_output.$$
You should then be getting a streaming update (with 4kB buffering, which
is pretty much the minimum you can get by default from fstreams) of the
stdout of your job(s) as 'my_job_output.script_PID' on submit_node.domain,
in the directory from which you started chirp_server.
There are countless variations of this scheme (add
authentication/authorisation, send the 'chirp_put' executable along with
the job if you cannot install it on the worker nodes, use a different
naming scheme, run the job via 'parrot', etc.) but it should serve your
basic need in any environment.
Does this still make sense ?