[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Efficiency & centralization of global information gathering?



Hi Michael,

I'm afraid you're right. Concurrency limits seem to fall flat when 1) what you really want is a resource reservation mechanism and 2) jobs are making permanent changes to the resource. Without some really good instrumentation on the jobs or filers, there's no way to accurately know how much of that space was written to by jobs in the farm as opposed to processes out of the farm. And as you've mentioned, without that information, you either end up double counting -resource consumption for the life of the job and under-utilizing your resource, or if you don't try to adjust, you end up with over-allocation.

I've looked into instrumenting farm jobs to collect i/o information at the kernel level using SystemTap, but that introduced too much overhead and resulted in 10-15% longer runtimes in my tests. I'm not sure if there's a good way to handle diskspace reservations for shared filesystems in HTCondor without a significant amount of extra engineering.


Your perl script would be really useful for us since what I didn't mention before was that at night, our user desktop systems automatically get added to the farm pool (and removed again in the morning). Having a more deterministic way of knowing what licenses are being used by jobs would be really useful. Unfortunately, our license servers are somewhat isolated VMs and don't have HTCondor installed. :( Fortunately, we don't get too many "out of license" errors and we have systems in place to automatically retry jobs that encounter them.

We use a lot of ifThenElse expressions in our configs, and I believe you can create NVL-like syntax in HTCondor configs with them like:

SOME_PARAM = ifThenElse((SOME_OTHER_PARAM=!=UNDEFINED), SOME_OTHER_PARAM, 1)

If SOME_OTHER_PARAM is defined, SOME_PARAM will be assigned its value. If SOME_OTHER_PARAM is not defined, SOME_PARAM=1.








On Mon, Jan 9, 2017 at 3:58 PM, Michael Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx> wrote:

Hereâs an alternative to matching hostnames in lmstat output:

Â

#!/usr/bin/perl

Â

my $running_claims = qx( condor_q -constraint 'JobStatus == 2 && ! isUndefined(ConcurrencyLimits)' -format '%v' 'split(ConcurrencyLimits, ", ")' );

Â

my @running_claims = $running_claims =~ ( m{"([^"]+)"}g );

Â

my %limit;

for (@running_claims) {

ÂÂÂ my ($name, $count) = split(':');

ÂÂÂ $count = length($count) ? $count : 1;

ÂÂÂ $limit{$name} += $count;

}

Â

for $key (keys(%limit)) {

ÂÂÂ print "${key}_condor_used = $limit{$key}\n";

}

Â

Then you can set the limit like so:

Â

App_license_lmstat_available = <pulled from lmstat>

App_license_condor_used = <pulled from condor_q above>

App_license_limit = Â$(app_license_lmstat_available) + $(app_license_condor_used)

Â

Iâm trying to remember if thereâs a mechanism in the configuration where you can say â$(value:0) and get 0 if value is undefined, rather than using an if/endif block but Iâm not finding it offhand. So whateverâs generating the config file would need to take that into account since the code above will only produce outputs for limits which are in current use..

Â

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ -Michael Pelletier.


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/