[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] change to condor_submit - user feedback desired! (was Re: multiple condor_submit's - one cluster)



Can I suggest implementing it as a list type/object?  That way lists from other sources just become a matter of creating the list.

e.g.
list mylist = getFiles(*.csv)
input=$(mylist)
queue foreach mylist

Giving the list a name also allows you know where to put the value without forcing a specific tag. A syntax error can be thrown if a reserved word is used.  If there's ever a need for nested lists, it'll provide separation.  I could see myself using nested lists of directories and chromosomes.

Adding other sources becomes as simple as adding a new list creator.  Something like (borrowing some ideas from R)
list mylist = getFromStdin()
or
list mylist = getFromFile("blah.csv",sep=",",quote="\"",field=4,header=TRUE)
or
list mylist = getFromString("a,b,c,fred,barney",sep=",")
list mylist = getFromString($ENV(blah),sep=",")
or
list mylist = getFromExec("mysql mydb \"select SNP from genome where chromosome='1'\"")

As to syntax, I'd rather not use for/foreach.  Having a programming background, I'd expect it to give me access to each item in the list to do whatever with (as per David Champion's 1st reply).  It's a list, might as well use listy type words to describe the action.

    queue [all] mylist
    queue first [n] mylist
    queue i..j mylist

If you wanted to, you could make "queue files *.csv" as a macro for "list files=getFiles(*.csv); queue all files"

That's how I'd do it.

klint.

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum
Sent: Wednesday, 11 February 2015 6:44 AM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] change to condor_submit - user feedback desired! (was Re: multiple condor_submit's - one cluster)


Hi David, Don, Klint, Ben, Dimitri, Carl, Brian, Lauren (hope I didn't omit anyone) -

Thanks much for the valuable feedback to date!  People like you are precisely why open source works and why HTCondor will continue to improve!

Currently pondering the points folks made. In a few days I will distill this down into a proposed concrete plan of action and post back here.  I think I will focus first on the details for the ability of condor_submit to scan the filesystem and do a submit for each file (option 1 in my original email), and let thoughts about condor_submit reading lines from stdin (option 2) mature a bit more... (yes I agree with Dimitri they are related, but want to start where I think the need is greatest...)

best regards,
Todd

On 2/10/2015 12:06 AM, David Champion wrote:
> * On 09 Feb 2015, Lauren Michael wrote:
>> Hi All,
>>
>> First, I strongly echo Ben's points, especially for keeping the 
>> submit file as a record of the exact syntax for future reference by 
>> the user (to understand what he/she did).
>>
>> For example, the following (#2):
>>      ls data/*.csv | grep foo | condor_submit -submit_per_line 
>> input_line employs skills and unix familiarity (grep, pipe) that most 
>> users I work with largely do not have. To remember and use such a 
>> command, they'd end up recording it in a document or perhaps a 
>> script. The greater the number of the arguments in the command, the 
>> more this type of recording becomes true, in my experience.
>
> I agree with the spirit of this. My enthusiasm for adding this kind of 
> notation is actually aligned with it: to address this control issue, 
> it's become fairly common (I think) to write shell scripts -- or 
> python, etc -- that generate condor_submit files as output. That's 
> another step yet removed from the submit file's being a job record. 
> That kind of circumstantial complexity exists now, and the closer we 
> can put it to the submit file, the better off users are.
>
>>
>> Stepping back, I believe there are multiple motivations emerging in 
>> this thread, though I'll also point out that I *believe* they are all 
>> from "advanced" users of HTCondor and unix (at least for the names I 
>> recognize in this thread, probably excluding myself).
>>
>> Here's an attempt at a summary of desired outcomes listed in this 
>> email thread so far:
>> 1. Provide users with an in-file alternative to $(Process) for cases 
>> when the user has many similarly-named but non-numbered files, and 
>> lacks the know-how/desire/time to convert such files to numbered 
>> filenames while maintaining metadata about which file is which.
>> (not mentioned here yet, but I'm adding it now, as I interact with 
>> countless non-advanced users facing this barrier and have otherwise 
>> discussed it at length with people like Todd T, motivating a 
>> foreach-like
>> option.)
>
> Another possibility occurs to me today: some kind of mapping 
> declaration might make it possible to translate ordered patterns to 
> sequences of control directives. But maybe that's just another color 
> of light shining on things we've already discussed (e.g. native control loops in c_s).
>
>> 2. Create a simple syntax for executing #1 that doesn't require 
>> significant unix/scripting experience.
>> 3. If possible, allow advanced users to also intuitively use the 
>> solution in a unix-y and/or scripting way.
>> 4. Minimize performance/latency side effects.
>>
>>
>> Specifically commenting on syntax:
>> I also see David's point for not creating a universal name ("file"). 
>> Is something *like* the following possible?:
>>
>> queue foreach species in $(species).data
>>
>> I'm also in favor of something like the above because I *think* 
>> "queue foreach data/*.csv" effectively co-ops the wildcard and would 
>> keep the user from specifying files using multiple wildcard instances 
>> (say, for sub-directories). For example, what if I wanted to "queue 
>> foreach *_data/*.data"?
>
> I think this shouldn't be a problem so long as the C++ implementation 
> uses fnmatch().  (Sorry for the technical-speak.)  But I think there's 
> a good point here that limiting the subject of the for(each) to files 
> on the filesystem is... well, potentially limiting.  Looking to future 
> possibilities, I would prefer that the syntax explicitly state that 
> it's matching local filenames.
>
>
>> I am so excited that we're at the point of crowd-sourcing input for 
>> such a feature!
>
> +1!
>


--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/