[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor feature suggestion: automatically compressed output files



On 7/30/06, Alex Gontmakher <gsasha@xxxxxxxxxxxxxxxxx> wrote:
Oh, checkpoint is a problem indeed, didn't think of that (but can't the state
of the compression algorithm be checkpointed as well?)

I wouldn't want to be the person supporting this.

Oh, my suggestion was to analyze the extension of the output file and use a
corresponding engine, i.e., use bzip2 if the file name ends with ".bz2" etc.
This way, the user gets a say on what engine to use, and the decompression
programs automatically will recognize the file.

a fair point, but then which ones to include - this all adds
additional complexity that I don't think condor needs (my personal
opinion of course)

Er??? I can't comment on the amount of recoding necessary - or at least, my
estimate is guesswork as I don't really know the architecture of Condor, but
I don't think my proposed solution is that unflexible...
> Admittedly this is at the cost of  one additional line/ entry in the
> submit script to transfer the 'real' exe as well as the script') and
> of course the lack of standard universe (but as we mentioned before
> this has complications if you do that anyways)
> For the standard universe there is a reasonable likelihood that, if
> you can relink the app you can prob also change it to output
> differently.
Well then, how do you propose changing a C program to compress its standard
output?

I'm no C programmer except when absolutely necessary but google and
some looking gets me

http://c-faq.com/stdio/freopen.html
to get a file (note provisos) so no streaming compression. (I concur
with the suggested 'best' behaviour in the next answer of ending all
output calls to your own redirectable function)

http://www.cse.ucsc.edu/~kent/src/unzipped/lib/textOut.c
Seems like it is trying to solve your problem amongst others

> The only really big saving you would get in complexity is if the job
> is cross platform, then wrapping in a script means creating multiple
> scripts each doing it differently. I admit this sounds nice but I
Well, our cluster happens to be cross platform indeed - a mix of Intel
machines and PowerPC blades... and the code I run on it is compiled and works
on both.

I meant cross platform in terms of OS (since the std io streams'
behaviour are as I understand it more OS specific than architecture
specific)

I''m not saying I don't think this is a good idea in general, I'm just
saying layering more complexity into condor's existing io
functionality when the functionality can be largely gained via
existing mechanisms is not always a good idea...

Matt