Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job Realtime output file

Date: Mon, 11 Mar 2013 14:19:11 +0100 (CET)
From: Francesco Prelz <Francesco.Prelz@xxxxxxxxxx>
Subject: Re: [HTCondor-users] Job Realtime output file


On Sat, 9 Mar 2013, Guillermo Marco Puche wrote:

I know those directives are SGE directives. From my pov is SGE handles job hemust be able also to handle it's own error and output logs.

The trouble here is that SGE is being handed, for a number of hardreasons, a russian doll of scripts to execute. Your job is the smallestdoll, while the -o and -e directives (and yes, you are overriding thedirectives set by default by 'bosco') apply to the outermost doll. It'svery likely that stdout and stderr are already being diverted at innerlayers. If you'd really like to see streaming stdout from your job, yourbest option (until we have some form of out-of-the-box Condor 'standarduniverse' for 'grid' or 'vanilla' universe jobs, which would indeed comein handy for many other applications) is probably to set up some form ofremote I/O yourself.

If you have at least outbound network connectivity from the worker nodesto the submit node you could try using 'chirp' (a standalone incarnationof the Aitch-Tee-Condor Remote I/O protocol, which may eventuallybe "re-"integrated into the 'grid' universe as the remote I/O method ofchoice).


In its simplest form:

0) Grab and install 'cctools', and make it available on the submit
   and worker nodes.
   http://www.cse.nd.edu/~ccl/software/download.shtml
   (the site seems to be down right now)

1) Start chirp_server on the submit node (will bind on port
   9094 by default, use *no* authentication/authorisation and
   write files in the current directory).

2) Run your payload on the worker nodes with
   ./payload |tee chirp_put -t -1 -b 4096 - submit_node.domain my_job_output.$$

You should then be getting a streaming update (with 4kB buffering, whichis pretty much the minimum you can get by default from fstreams) of thestdout of your job(s) as 'my_job_output.script_PID' on submit_node.domain,in the directory from which you started chirp_server.

There are countless variations of this scheme (addauthentication/authorisation, send the 'chirp_put' executable along withthe job if you cannot install it on the worker nodes, use a differentnaming scheme, run the job via 'parrot', etc.) but it should serve yourbasic need in any environment.


Does this still make sense ?

Francesco Prelz
INFN-MI

Follow-Ups:
- Re: [HTCondor-users] Job Realtime output file
  - From: Derek Weitzel

References:
- Re: [HTCondor-users] Job Realtime output file
  - From: Guillermo Marco Puche
- Re: [HTCondor-users] Job Realtime output file
  - From: Derek Weitzel
- Re: [HTCondor-users] Job Realtime output file
  - From: Guillermo Marco Puche
- Re: [HTCondor-users] Job Realtime output file
  - From: Derek Weitzel
- Re: [HTCondor-users] Job Realtime output file
  - From: Guillermo Marco Puche

Prev by Date: [HTCondor-users] GPU Management
Next by Date: Re: [HTCondor-users] GPU Management
Previous by thread: Re: [HTCondor-users] Job Realtime output file
Next by thread: Re: [HTCondor-users] Job Realtime output file
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Job Realtime output file