[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] change to condor_submit - user feedback desired! (was Re: multiple condor_submit's - one cluster)



Hi Todd and everyone,

This discussion is so welcome as it opens the door for a problem I have encountered repeatedly.
It's not uncommon to see 10's or even 100's of processes which are independently issuing condor_submit commands as quickly  as they can.
On numerous occasions I have seen this result in waits of many minutes for a single condor_submit command to complete and return.
This resulted in cyclic instability in the osg-xsede submit host for several months.

A nearly invisible solution is to provide one or a few instances of a central daemon which executes the real condor_submit functions.
The condor_submit's that users would execute would look the same or almost so (see below) but would place their requests in a queue
which is handled by the daemon(s).  This would eliminate the real need to serialize calls to condor_submit which often arises.  For my jobs I use
a token system to insure that only one process is issuing condor_submit calls at a time and that process always blocks till the call returns.  

(1) The call provided to the user could include a switch, a parameter, which could be set to force the current call to execute in the historically normal fashion.
The default for this switch would be to go through the daemon but this could be over-ruled.
(2) "grouping" parameters could be included which instructed the daemon to execute this condor_submit in a group with the ones which come before and after
with the same value.  That would allow the author of the daemon to put the group of jobs together into a single Process any way she/he wishes.  And that could evolve without involving the users.

Enhancements to the scripting language provided in the configuration files is less desirable for most users because of their focus on doing science rather than programming.
Moving the user's condor_submit command a layer away from communicating directly with the condor job queueing apparatus would enable condor developers to solve
many job submission problems with minimal demand for learning or action by users.

Regards,
 
Don
 

Don Krieger, Ph.D.
Department of Neurological Surgery
University of Pittsburgh

> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of Todd Tannenbaum
> Sent: Friday, February 06, 2015 11:00 AM
> To: HTCondor-Users Mail List
> Subject: [HTCondor-users] change to condor_submit - user feedback desired!
> (was Re: multiple condor_submit's - one cluster)
> 
> On 2/6/2015 9:04 AM, Krieger, Donald N. wrote:
> > Hi Todd,
> >
> > Thanks for posting back with the answers. It's very helpful.
> >
> > My problem is that the names I'm using for the log files step through
> > a sequence of strings rather than a sequence of numbers. And I have a
> > downstream management routine that uses those names which I would have
> > to alter if I changed the naming sequence.
> 
> Hi Don,
> 
> Thanks for the above explanation.  Indeed, $(Process) is great in your submit
> files, but only if your data files are numerically sequenced!
> 
> Perhaps we could create a general solution in condor_submit that addresses
> the above and yet would still a) allow all the submits to happen at once into
> one cluster and b) not require the user to know how to write scripts.
> 
> I listed a couple brainstorm ideas below that would be relatively easy to
> implement.  Do folks think that either of the below be helpful? Would love to
> hear any feedback, alternative ideas.
> 
> regards,
> Todd
> 
> Some brainstorm ideas:
> 
> 1. A "queue foreach <filepattern>" command.  Folks could then have submit
> files that look like this:
> 
>      input = $(file)
>      output = $(file).output
>      queue foreach data/*.csv
> 
> So for each file in subdir data that ends in .csv, a job would be submitted and
> $(file) would expand to the path to the file.
> 
> and/or
> 
> 2. A command line option to condor_submit that tells it to read stdin, and to do
> a submit for each stdin line, substituting each line from stdin with $(input_line).
> Folks could then have submit files that look like this:
> 
>     input = $(input_line)
>     output = $(input_line)
>     queue
> 
> and invoke condor_submit via lines like:
> 
>     ls data/*.csv | grep foo | condor_submit -submit_per_line
> 
> Comments? Other ideas?
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/