[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] SYSTEM_JOB_MACHINE_ATTRS



We want to do similar condor_history scaling within LIGO but the underlying issue is that RemoteWallClockTime is cumulative over all machines that a job executed. How do you handle that? Is it an issue in your scheduling system at all?

One could hand roll something using the functionality where job attributes (incl. machine attributes added to the job) are retained for up to N matches, but that's a bit hacky. We've had informal discussions with HTCondor about something more sophisticated like Intel PCM. But nothing in detail.

It matters primarily for our ability to make accurate funding requests for modern hardware based upon data from a mix of contemporary and previous-generation hardware.

Tom

ïOn 1/21/19, 9:10 AM, "HTCondor-users on behalf of Stephen Jones" <htcondor-users-bounces@xxxxxxxxxxx on behalf of sjones@xxxxxxxxxxxxxxxx> wrote:

    Hi Steven,
    
    On 21/01/2019 14:37, Steven C Timm wrote:
    >
    > MachineRalScaling is not a default HTCondor attribute, it must be 
    > something that is being defined in GridPP clusters somehow.
    >
    It's defined for our APEL accounting. We attach it to "the job" via 
    SUBMIT_EXPRS on the head node.
    
    MachineRalScaling = "$$([ifThenElse(isUndefined(RalScaling), 1.00, RalScaling)])"
    SUBMIT_EXPRS = $(SUBMIT_EXPRS) MachineRalScaling
    
    Since the $$ syntax is used, expansion is delayed until the job gets on 
    the worker-node. In the worker node, a local value (RalScaling) is 
    substituted in. This gives the power of the node, hence we can have 
    heterogeneous worker-nodes and the the power expression comes out in the 
    job data. The MachineRalScaling "emerges" in the condor_history data. 
    It's then a simple matter to multiply the wallclocktime by  
    MachineRalScaling to "normalise" the job, i.e. make them all the same.
    
    # condor_history -long 1233764.0 | grep MATCH_EXP
    MATCH_EXP_MachineRalScaling = "1.036000000000000E+00"
    
    This has stopped happening, in the new condor I use. I was wondering if 
    the behaviour has been changed, somehow. I got it working with 
    SYSTEM_JOB_MACHINE_ATTRS.... but I'd like to keep things the same.  I 
    have a feeling that MATCH_EXP_* is an "undocumented" feature, since I 
    can't see it anywhere in the manuals.
    
    Cheers,
    
    Ste
    
    
    -- 
    Steve Jones                             sjones@xxxxxxxxxxxxxxxx
    Grid System Administrator               office: 220
    High Energy Physics Division            tel (int): 43396
    Oliver Lodge Laboratory                 tel (ext): +44 (0)151 794 3396
    University of Liverpool                 http://www.liv.ac.uk/physics/hep/
    
    _______________________________________________
    HTCondor-users mailing list
    To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
    
    The archives can be found at:
    https://lists.cs.wisc.edu/archive/htcondor-users/
    

Attachment: smime.p7s
Description: S/MIME cryptographic signature