[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Submit requirements & ignoring significant attributes



Ah yes, good point.  Because the jobs Requirements refers directly to QDate that becomes one of the AutoClusterAttributes for the job because AutoClusterAttributes is set to the projection of significant attributes (that is: significant attributes + any job attributes referenced by a significant attribute).

If we removed QDate from the projection then the job requirements would ALWAYS resolve to undefined in the negotiator.  It needs to be remove from the Requirements expression itself.

If your job route removes the sub-clause that refers to QDate from the Requirements expression,  then QDate should also be stripped form the AutoClusterAttributes the next time the projection is calculated.

So you could do something like this.

JobUnroutedForTooLong = ((time() - QDate) > $(JOB_ROUTER_POLLING_PERIOD)*2
Requirements = something || JobUnroutedForTooLong

Then have the job router set JobUnroutedForTooLong to false.

QDate then becomes significant only for jobs that have not yet been routed.

-tj

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum
Sent: Thursday, October 27, 2016 10:18 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Submit requirements & ignoring significant attributes

My guess is REMOVE_SIGNIFICANT_ATTRIBUTES in the schedd only filters out significant attributes coming from the central manager (because they were referenced in startd policy expressions like START or RANK), but perhaps does not filter out attributes referenced by significant attributes in the job itself.

Realize that by marking Requirements as significant, and yet attempting to filter out QDate which is referenced by Requirements, the likely result is the job requirements expression will end up evaluating to undefined in the matchmaker, and your jobs would never match. 
Significant attributes are not just used for autoclustering - they are also used as projection when the schedd sends resource requests to the negotiator.

The bottom line is that computing significant attributes is quite the involved process, and it is best to let HTCondor figure it out on its own.  Trying to override HTCondor's algorithm for significant attributes is really a desperate move that should be avoided, and is likely to result in jobs failing to match machines for strange reasons. For the curious, details on how all this autoclustering works can refer to this design document:

https://docs.google.com/document/d/1_YqnFrLnJQ91ihxc8kyx2Y9FwofdnNy74ueRm4-NMng


regards,
Todd



On 10/27/2016 10:04 AM, John M Knoeller wrote:
> REMOVE_SIGNIFICANT_ATTRIBUTES should work, provided that it is set 
> before the SCHEDD was started (actually before The first time the 
> SCHEDD sees a job that has QDate in its requirements expression.)
>
> The SCHEDD has an internal state variable that is a lifetime union of all significant attributes, setting REMOVE_SIGNIFICATION_ATTRIBUTES will keep attributes from being ADDED to this variable, but it will not remove them from this variable.
>
> Are you saying QDate was still a significant attribute even after a restart?
>
> -tj
>
> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On 
> Behalf Of Fischer, Max (SCC)
> Sent: Thursday, October 27, 2016 9:29 AM
> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Subject: Re: [HTCondor-users] Submit requirements & ignoring 
> significant attributes
>
> Hi Todd,
>
> I'm just going to reply here as I burdened Frank with some of that 
> code. :D
>
> 1. We do not just need to transform the job, but also track it over its entire lifetime. There are actually hooks doing most of the work. Having plain router style transforms is also not sufficient, as we must communicate with an external service. We could probably do without the update hooks, but the finalize hook is important for us: it is used to figure out whether the routing gave significant performance improvement.
>
> 2. Having a separate "cleanup/delay" route to delay and release unrouted jobs could work. The `( ( CurrentTime - QDate ) > ( 10 * 3 ) ) ` expression is only required to release jobs which are *not* routed by the specific route in question.
>
> Number 2. is IMO the cleanest solution, but shouldn't REMOVE_SIGNIFICANT_ATTRIBUTES at least work? It seems to have no effect.
>
> Cheers,
> Max
>
>> Am 27.10.2016 um 15:18 schrieb Todd Tannenbaum <tannenba@xxxxxxxxxxx>:
>>
>> On 10/25/2016 10:34 AM, Frank Fischer wrote:
>>> Hi all,
>>>
>>> I'm facing some strange issues in my configuration file regarding 
>>> auto-clustering/significant attributes and adding requirements upon 
>>> job submission (in our setup we have a job route defined - we append 
>>> requirements in order to make sure the router acts before the jobs
>>> starts)
>>>
>>>> APPEND_REQUIREMENTS = ( (INPUT_FILES =?= UNDEFINED) || (HPDA_Route 
>>>> =?=
>>>> TRUE) || (( CurrentTime - QDate ) > ( $(JOB_ROUTER_POLLING_PERIOD) 
>>>> *
>>>> 3
>>>> )) )
>>>
>>
>> So the real problem you are trying to solve is to prevent a job from running before the job_router acted on it.
>>
>> I suggest you get rid of all your customization of SIGNIFICANT_ATTRIBUTES (you will need to do "condor_restart -fast" after this, a condor_reconfig is not enough), and get rid of references to CurrentTime and QDate in APPEND_REQUIREMENTS, and instead solve the real problem via one of two possibilities:
>>
>> 1. If you can run HTCondor v8.5 on your submit machine, perhaps you 
>> can use a "job transform" instead of using the job_router.  A "job 
>> transform" allows the admin to edit an incoming job classad, and is 
>> performed by the schedd BEFORE the job enters the queue.  See the 
>> manual at https://is.gd/5p3Wks or the ticket about this at
>>  https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=5885
>>
>> OR
>>
>> 2. If you want to do this via the job_router, you could add to your 
>> config something like  JobWasRouted = False  SUBMIT_ATTRS =
>> $(SUBMIT_ATTRS) JobWasRouted  APPEND_REQUIREMENTS = ( JobWasRouted 
>> =?= True ) and then in your route rule include
>>   set_JobWasRouted = True;
>> This way you achieve your goal without referencing QDate in your Requirements expression which, as you discovered, wreaks havoc with autoclustering.
>>
>> Hope the above helps,
>> Todd
>>
>>
>>> The following configuration values (should) handle significant attributes:
>>>
>>>> # Round attributes (up) for better AutoClustering.
>>>> # 25%: 112 => 125, 1133 => 1250
>>>> # 3  : 112 => 1000, 1212 => 2000
>>>> SCHEDD_ROUND_ATTR_RequestWalltime = 3 
>>>> SCHEDD_ROUND_ATTR_RequestMemory = 20% SCHEDD_ROUND_ATTR_RequestDisk 
>>>> = 25%
>>>>
>>>> SIGNIFICANT_ATTRIBUTES = JobUniverse,WantDocker,\
>>>>                         RequestWalltime,\
>>>>                         RequestCpus,RequestMemory,RequestDisk,\
>>>>                         Requirements,\
>>>>                         RemoteJob,ExperimentalJob,\
>>>>                         HPDA_Route
>>>> REMOVE_SIGNIFICANT_ATTRIBUTES = DiskUsage, QDate
>>>
>>> So far so good.
>>>
>>> Now here's an example of condor_q -autocluster -long:
>>>
>>>> ServerTime = 1477408979
>>>> AutoClusterId = 2836
>>>> JobCount = 7
>>>> Requirements = ( ( TARGET.CLOUDSITE == "BWFORCLUSTER" ) ) && ( ( ( 
>>>> INPUT_FILES =?= undefined ) || ( HPDA_Route =?= true ) || ( ( 
>>>> CurrentTime - QDate ) > ( 10 * 3 ) ) ) ) && ( TARGET.Arch == "X86_64"
>>>> ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) 
>>>> && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer ) 
>>>> RequestDisk = DiskUsage JobIds = "175098.1 ... 175098.16"
>>>> JobUniverse = 5
>>>> RequestCpus = 1
>>>> DiskUsage = 750000
>>>> RequestWalltime = 69000
>>>> QDate = 1477393245
>>>> RemoteJob = true
>>>> RequestMemory = 4000
>>>
>>> Apparently QDate and DiskUsage are NOT removed from significant 
>>> attributes, although I explicitly told HTCondor to do so.
>>>
>>> Am I missing something or do you see an error in my though process?
>>> I'm out of ideas, what I could try to increase the number of 
>>> clustered jobs.
>>>
>>> Thanks & regards
>>> Frank
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>>> with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting 
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
>>
>> --
>> Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
>> Center for High Throughput Computing   Department of Computer Sciences
>> HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
>> Phone: (608) 263-7132                  Madison, WI 53706-1685
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>> with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx 
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>


--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/