[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] SYSTEM_JOB_MACHINE_ATTRS



Do the MATCH_EXP_xxxxx fields still appear in the job classad of jobs that are running?  IF so then there is probably some field that you have to set to make them appear in condor_history as well.  Not all fields that appear in the live job classads necessarily show up in condor_history.


Steve



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Tom Downes <downes@xxxxxxx>
Sent: Monday, January 21, 2019 9:34:44 AM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] SYSTEM_JOB_MACHINE_ATTRS
 
We want to do similar condor_history scaling within LIGO but the underlying issue is that RemoteWallClockTime is cumulative over all machines that a job executed. How do you handle that? Is it an issue in your scheduling system at all?

One could hand roll something using the functionality where job attributes (incl. machine attributes added to the job) are retained for up to N matches, but that's a bit hacky. We've had informal discussions with HTCondor about something more sophisticated like Intel PCM. But nothing in detail.

It matters primarily for our ability to make accurate funding requests for modern hardware based upon data from a mix of contemporary and previous-generation hardware.

Tom

ïOn 1/21/19, 9:10 AM, "HTCondor-users on behalf of Stephen Jones" <htcondor-users-bounces@xxxxxxxxxxx on behalf of sjones@xxxxxxxxxxxxxxxx> wrote:

    Hi Steven,
   
    On 21/01/2019 14:37, Steven C Timm wrote:
    >
    > MachineRalScaling is not a default HTCondor attribute, it must be
    > something that is being defined in GridPP clusters somehow.
    >
    It's defined for our APEL accounting. We attach it to "the job" via
    SUBMIT_EXPRS on the head node.
   
    MachineRalScaling = "$$([ifThenElse(isUndefined(RalScaling), 1.00, RalScaling)])"
    SUBMIT_EXPRS = $(SUBMIT_EXPRS) MachineRalScaling
   
    Since the $$ syntax is used, expansion is delayed until the job gets on
    the worker-node. In the worker node, a local value (RalScaling) is
    substituted in. This gives the power of the node, hence we can have
    heterogeneous worker-nodes and the the power _expression_ comes out in the
    job data. The MachineRalScaling "emerges" in the condor_history data.
    It's then a simple matter to multiply the wallclocktime by 
    MachineRalScaling to "normalise" the job, i.e. make them all the same.
   
    # condor_history -long 1233764.0 | grep MATCH_EXP
    MATCH_EXP_MachineRalScaling = "1.036000000000000E+00"
   
    This has stopped happening, in the new condor I use. I was wondering if
    the behaviour has been changed, somehow. I got it working with
    SYSTEM_JOB_MACHINE_ATTRS.... but I'd like to keep things the same.  I
    have a feeling that MATCH_EXP_* is an "undocumented" feature, since I
    can't see it anywhere in the manuals.
   
    Cheers,
   
    Ste
   
   
    --
    Steve Jones                             sjones@xxxxxxxxxxxxxxxx
    Grid System Administrator               office: 220
    High Energy Physics Division            tel (int): 43396
    Oliver Lodge Laboratory                 tel (ext): +44 (0)151 794 3396
    University of Liverpool                 http://www.liv.ac.uk/physics/hep/
   
    _______________________________________________
    HTCondor-users mailing list
    To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
   
    The archives can be found at:
    https://lists.cs.wisc.edu/archive/htcondor-users/