[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor feature suggestion: automatically compressed output files



On 7/27/06, Alex Gontmakher <gsasha@xxxxxxxxxxxxxxxxx> wrote:
> You've said your job writes to stdout -- if that's the case, wouldn't it
> be trivial to just add "|bzip2 >outfile.bz2"?  No intermediate storage,
> no NFS thrashing, no added code in an already complicated job control
> system... And I don't think this would even be platform-specific, as
> even Windows (IIRC) supports i/o redirection.
There are several problems with the solution you propose.

First, Condor does not allow including inline scripts or even just sequences
of commands connected by a pipeline - you can write a script around your
executable, but there are problems with that which I stated earlier (not to
mention that it's somewhat ugly as it would give up on many of Condor's
capabilities)

Second, Condor does have special handling for program's output and error files
(and it actually does have gzip/gunzip functionality for input/output in some
cases), so it's quite a natural extension for it.

What it should do in the standard universe on checkpoint is not so
clear. Recommencing output to the stream might be a best problematic
and at worst require new protocol level functionality.
Streaming output may also need changes. No idea about Globus, GAHP, GCB etc..
What compression to use is also something that is best left in the
hands of the app writer (who would know whether the significant
additional cpu cost of bzip2 or 7z was worth it against using a
straight deflate).

Providing something like deflate as a useful default does sound nice -
but would likely require considerable restrictions on use to prevent
it causing a lot more recoding that initially expected. Due to this
the ever present "wrap it in a script" option is always likely to be
the default response since it allows so much more flexibility and
power.
Admittedly this is at the cost of  one additional line/ entry in the
submit script to transfer the 'real' exe as well as the script') and
of course the lack of standard universe (but as we mentioned before
this has complications if you do that anyways)
For the standard universe there is a reasonable likelihood that, if
you can relink the app you can prob also change it to output
differently.

The only really big saving you would get in complexity is if the job
is cross platform, then wrapping in a script means creating multiple
scripts each doing it differently. I admit this sounds nice but I
don't know how many people use this functionality to make the
additional code/maintenance complexity worth it (that's a question for
the cs.wisc guys and gals :)

Matt