[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Process list of files



Hi Roger,

I think there's some minor RAM and other savings when using multiple procs per cluster for the submitter; for our local case, it doesn't make a significant difference in resource consumption.  I can attest that you can run a submit machine just fine with 20k running jobs without having to do multiple processes per cluster.

*However*, for me the utility of job clusters is in job management.  If you want to remove all jobs in a single cluster 1234, you can do:

condor_rm 1234

If you want to retain something like this, you can always include a task label:

-append '+TaskName="foo"'

(note - user-added attributes start with a + and obey classad quoting rules!!!) then remove like this:

condor_rm -const 'TaskName=?="foo"'

That may be more than what users want to go through - some might just resort to killing them off one-by-one.

Brian

> On Dec 5, 2014, at 9:04 PM, Roger Herikstad <roger.herikstad@xxxxxxxxx> wrote:
> 
> Hi Brian, 
> Thanks! That's exactly what I was looking for. Works like a charm. One minor observation; I notice that each job becomes its own Cluster under this method. From my understanding, Cluster is like SGE's normal job, while Process is like SGE's index into an array job. I was just wondering if there is any performance penalty to using multiple clusters instead of multiple processes within a cluster? From my experience with SGE, submitting array jobs is a bit easier on the master compared to multiple single process jobs. Anyway, this is just to satisfy my curiosity. Again, your solution works great. Thanks!
> 
> ~ Roger
> On 6 Dec 2014, at 10:22, Brian Bockelman wrote:
> 
>> Hi Roger,
>> 
>> If I'm reading your email right, I think you want to look at the "-append" argument to condor_submit.  It would look something like this:
>> 
>> for s in $( ls <some file pattern); do
>> <some setup>
>> condor_submit my_submit_file -append my_file=$s
>> done
>> 
>> Here's the man page for "-append":
>> 
>> """
>>    -append command
>> 
>>       Augment  the  commands in the submit description file with the given command. This command will be considered to immediately precede the Queue command within the submit description
>>       file, and come after all other previous commands. The submit description file is not modified. Multiple commands are specified by using the -append option multiple times. Each  new
>>       command is given in a separate -append option. Commands with spaces in them will need to be enclosed in double quote marks.
>> """
>> 
>> You would then have the following arguments and executable line:
>> 
>> executable=<some command>
>> arguments="<my arguments> $(my_file)"
>> 
>> The submit file is a macro language; the "arguments" command will have $(my_file) expanded from what you put in the CLI.
>> 
>> Hope this is what you're looking for,
>> 
>> Brian
>> 
>>> On Dec 5, 2014, at 7:52 PM, Roger Herikstad <roger.herikstad@xxxxxxxxx> wrote:
>>> 
>>> Hi everyone,
>>> I recently started testing HTCondor as a replacement for our ageing SGE setup. Our lab uses Mac Pro's as combined compute nodes/workstations, and though SGE works reasonable well with this setup, the fact that it is no longer open source (and has a somewhat convoluted build/install process), made me look around for alternatives. I've managed to get HTCondor up and running with few of our nodes, and so far I really like what I'm seeing. I have one question, though. Our typical SGE workflow would be something like this:
>>> 
>>> From the terminal:
>>> 
>>> for s in $( ls <some file pattern> ); do <some setup>; echo "cd $PWD; <some command with arguments> $s" | qsub <sge arguments>; done
>>> 
>>> In other words, we would have a list of files containing data to be processed, and we would send of one SGE job per file. What is the best way to achieve this workflow under HTCondor? I came across this:
>>> https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=VaryArgumentsByProcId
>>> which gets me part of the way, but if I understand it correctly, I'd have to manually define the list of files as a ClassAd in the submit file. This is certainly doable, but knowing my users, the more steps you have go through to submit jobs, the less likely they are to use the system.
>>> i'd appreciate any thoughts on this from the list. Thanks!
>>> 
>>> 
>>> Roger Herikstad
>>> Research Fellow
>>> SiNAPSE
>>> National University of Singapore
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>> 
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>> 
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/