[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] change to condor_submit - user feedback desired! (was Re: multiple condor_submit's - one cluster)



Todd -

* On 06 Feb 2015, Todd Tannenbaum wrote: 
> 
> 1. A "queue foreach <filepattern>" command.  Folks could then have submit
> files that look like this:
> 
>     input = $(file)
>     output = $(file).output
>     queue foreach data/*.csv
> 
> So for each file in subdir data that ends in .csv, a job would be submitted
> and $(file) would expand to the path to the file.

I like the general idea.  Maybe "queue for file in data/*.csv" instead,
to allow user to identify the variable (and match common syntax).

Some more ideas below that go beyond this though.


> 2. A command line option to condor_submit that tells it to read stdin, and
> to do a submit for each stdin line, substituting each line from stdin with
> $(input_line).  Folks could then have submit files that look like this:
> 
>    input = $(input_line)
>    output = $(input_line)
>    queue
> 
> and invoke condor_submit via lines like:
> 
>    ls data/*.csv | grep foo | condor_submit -submit_per_line

Also an interesting idea.  Again, maybe allow user to specify macro name?
    ls data/*.csv | grep foo | condor_submit -submit_per_line input_line

If hardcoding the name, perhaps "$(stdin)" rather than "$(input_line)".


Two other thoughts to throw out there, riffing off this basic
need/interest:

1. What about a more general looping capability?  This adds
another concept (`cmd`), but is something that I'm exploring with
a condor_submit wrapper. (You can never have enough condor_submit
wrappers, it seems.)  I don't care much about the specific syntax,
just an illustration:

for file in `ls data/*.csv`
	input = $(file)
	output = $(file).output
	queue
end

You've probably gotten this before, and I don't know what the issues
are, so feel free just to say "we already decided not to do this."


2. Make a macro that always reads one more line from stdin at the time
it's evaluated.  And make a queue variation that queues until some
condition is true.

# read a line
current_file = $(stdin)

# apply that line to two other settings
input = $(current_file).in
output = $(current_file).out

# keep going until no more lines (or blank line)
queue until $(current_file) == ""

I'm not sure this is a complete concept but maybe you get the idea.


3. Finally, as to the specific syntax of $(stdin) (or $(input_line),
whatever): Maybe it makes sense to create a general $(<name) notation,
where name identifies a file (or fd, in limited cases like stdin) to
read from.  Then you can read the list of parameters from a static file
in addition to reading from stdin.

I think that with #1 or #3 (#2 beside the point) there's no need for a
new command line option.

Sorry for the length.  I can do this all day but at some point we all
need to work. :)

-- 
David Champion â dgc@xxxxxxxxxxxx â University of Chicago â OSG Connect