[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Dataflow job skips when executable is updated



I just discovered the massively useful "skip_if_dataflow" submit option (how did I miss this before?).

I'm going to guess because it's not in the manual's index or in the condor_submit man page; the only place I can find it in the manual is in the file-transfer section, which we should probably fix.

Its docs say that the job will be skipped only if its outputs are newer than either its inputs or executable. This works correctly for the inputs, but when I touch the executable the job still skips.

The logic that's actually implemented is that a job marked as dataflow is skipped if:

* the oldest output file is newer than the newest input file,
* or the executable is newer than the newest input file,
* or the standard input is newer than the newest input file.

This matches the documentation I found. Where did you find something saying what you did above?

Honestly, I'm not sure what the reasoning for the second two points is; I would have assumed the same thing you did, that updating the executable (or standard input file) means you want to run the job again. This may just have been a mistake in the original design.

A related one for the wish list: when "transfer_output_files = dir", then if directory 'dir' already exists, its timestamp isn't updated when Condor transfers it back (at least on my file system), hence the job will never be skipped.

I'm aware that directory timestamp updates depend on file system and transfer mechanism, but for dataflow jobs it would be desirable if Condor explicitly touched the items in transfer_output_files upon return.

	This sounds like a good idea, and I made a ticket for it.

- ToddM