Re: [HTCondor-users] SYSTEM_JOB_MACHINE

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

We want to do similar condor_history scaling within LIGO but the underlying issue is that RemoteWallClockTime is cumulative over all machines that a job executed. How do you handle that? Is it an issue in your scheduling system at all?

One could hand roll something using the functionality where job attributes (incl. machine attributes added to the job) are retained for up to N matches, but that's a bit hacky. We've had informal discussions with HTCondor about something more sophisticated like Intel PCM. But nothing in detail.

It matters primarily for our ability to make accurate funding requests for modern hardware based upon data from a mix of contemporary and previous-generation hardware.

Tom

ïOn 1/21/19, 9:10 AM, "HTCondor-users on behalf of Stephen Jones" <htcondor-users-bounces@xxxxxxxxxxx on behalf of sjones@xxxxxxxxxxxxxxxx> wrote:

    Hi Steven,

    On 21/01/2019 14:37, Steven C Timm wrote:
    >
    > MachineRalScaling is not a default HTCondor attribute, it must be
    > something that is being defined in GridPP clusters somehow.
    >
    It's defined for our APEL accounting. We attach it to "the job" via
    SUBMIT_EXPRS on the head node.

    MachineRalScaling = "$$([ifThenElse(isUndefined(RalScaling), 1.00, RalScaling)])"
    SUBMIT_EXPRS = $(SUBMIT_EXPRS) MachineRalScaling

    Since the $$ syntax is used, expansion is delayed until the job gets on
    the worker-node. In the worker node, a local value (RalScaling) is
    substituted in. This gives the power of the node, hence we can have
    heterogeneous worker-nodes and the the power _expression_ comes out in the
    job data. The MachineRalScaling "emerges" in the condor_history data.
    It's then a simple matter to multiply the wallclocktime by
    MachineRalScaling to "normalise" the job, i.e. make them all the same.

    # condor_history -long 1233764.0 | grep MATCH_EXP
    MATCH_EXP_MachineRalScaling = "1.036000000000000E+00"

    This has stopped happening, in the new condor I use. I was wondering if
    the behaviour has been changed, somehow. I got it working with
    SYSTEM_JOB_MACHINE_ATTRS.... but I'd like to keep things the same. I
    have a feeling that MATCH_EXP_* is an "undocumented" feature, since I
    can't see it anywhere in the manuals.

    Cheers,

    Ste


    --
    Steve Jones                             sjones@xxxxxxxxxxxxxxxx
    Grid System Administrator               office: 220
    High Energy Physics Division            tel (int): 43396
    Oliver Lodge Laboratory                 tel (ext): +44 (0)151 794 3396
    University of Liverpool                 http://www.liv.ac.uk/physics/hep/

    _______________________________________________
    HTCondor-users mailing list
    To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

    The archives can be found at:
    https://lists.cs.wisc.edu/archive/htcondor-users/

Mailing List Archives

Public Access

Re: [HTCondor-users] SYSTEM_JOB_MACHINE_ATTRS