[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] change to condor_submit - user feedback desired! (was Re: multiple condor_submit's - one cluster)



* On 09 Feb 2015, Lauren Michael wrote: 
> Hi All,
> 
> First, I strongly echo Ben's points, especially for keeping the submit file
> as a record of the exact syntax for future reference by the user (to
> understand what he/she did).
> 
> For example, the following (#2):
>     ls data/*.csv | grep foo | condor_submit -submit_per_line input_line
> employs skills and unix familiarity (grep, pipe) that most users I work
> with largely do not have. To remember and use such a command, they'd end up
> recording it in a document or perhaps a script. The greater the number of
> the arguments in the command, the more this type of recording becomes true,
> in my experience.

I agree with the spirit of this. My enthusiasm for adding this kind of
notation is actually aligned with it: to address this control issue,
it's become fairly common (I think) to write shell scripts -- or python,
etc -- that generate condor_submit files as output. That's another step
yet removed from the submit file's being a job record. That kind of
circumstantial complexity exists now, and the closer we can put it to
the submit file, the better off users are.

> 
> Stepping back, I believe there are multiple motivations emerging in this
> thread, though I'll also point out that I *believe* they are all from
> "advanced" users of HTCondor and unix (at least for the names I recognize
> in this thread, probably excluding myself).
> 
> Here's an attempt at a summary of desired outcomes listed in this email
> thread so far:
> 1. Provide users with an in-file alternative to $(Process) for cases when
> the user has many similarly-named but non-numbered files, and lacks the
> know-how/desire/time to convert such files to numbered filenames while
> maintaining metadata about which file is which.
> (not mentioned here yet, but I'm adding it now, as I interact with
> countless non-advanced users facing this barrier and have otherwise
> discussed it at length with people like Todd T, motivating a foreach-like
> option.)

Another possibility occurs to me today: some kind of mapping declaration
might make it possible to translate ordered patterns to sequences of
control directives. But maybe that's just another color of light shining
on things we've already discussed (e.g. native control loops in c_s).

> 2. Create a simple syntax for executing #1 that doesn't require significant
> unix/scripting experience.
> 3. If possible, allow advanced users to also intuitively use the solution
> in a unix-y and/or scripting way.
> 4. Minimize performance/latency side effects.
> 
> 
> Specifically commenting on syntax:
> I also see David's point for not creating a universal name ("file"). Is
> something *like* the following possible?:
> 
> queue foreach species in $(species).data
> 
> I'm also in favor of something like the above because I *think* "queue
> foreach data/*.csv" effectively co-ops the wildcard and would keep the user
> from specifying files using multiple wildcard instances (say, for
> sub-directories). For example, what if I wanted to "queue foreach
> *_data/*.data"?

I think this shouldn't be a problem so long as the C++ implementation
uses fnmatch().  (Sorry for the technical-speak.)  But I think there's
a good point here that limiting the subject of the for(each) to files
on the filesystem is... well, potentially limiting.  Looking to future
possibilities, I would prefer that the syntax explicitly state that it's
matching local filenames.


> I am so excited that we're at the point of crowd-sourcing input for such a
> feature!

+1!

-- 
       David Champion â dgc@xxxxxxxxxxxx â University of Chicago