Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor diagram of daemons

Date: Mon, 04 Apr 2022 12:44:47 -0500
From: Greg Thain <gthain@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] HTCondor diagram of daemons

On 4/4/22 12:21, West, Matthew wrote:

Thanks Greg for finding the diagrams (slides 23-32) from T.T. I was looking for. The notes on slides 28-30 of the PPTX version go into finer sequential detail about the steps in Claim acquisition. Some additional questions:

What is Q on slide 27? I understand that J is the job classad and S is the classad for the execute machine.

An important part of the design is that we want the system as a whole to be able to support more jobs than any one scheduler process can hold. (Even though there are many sites with just one schedd).Â So, when the schedd wants to tell the collector that it needs matches from the negotiator, it can't just upload all the jobs to the collector, as there might be way too many.Â Instead, it sends "submitter records", which condense the requests into a single classad record per submitter, with the number of requests each submitter in that schedd has. If you are curious, these are visible with the "condor_status -submitter" command.Â I'm not sure how we got "Q" out of that.

Does the Shadow talk to the Startd and tell it to make a Starter?

Yes.Â Once the schedd has been given a slot to use from the negotiator, the schedd "claim"s the slot, for exclusive (but time-limited) use by that schedd.Â Assuming that succeeds, the starter "activate"s the claim to run a single job, which causes the startd to create the starter.

Where does file transfer go (inbound and outbound) in these steps

File transfer is handled by the shadow and the starter.Â Input xfer happens right after activation, and Output after the job completes, but the claim is still active during file xfer.

Are there additional communications between processes once a single job is completed?

Once the first job on a claim completes, if the amount of time it took is less than CLAIM_WORKLIFE, and the schedd can find another job that fits in the slot, it is free to launch another starter to reuse the existing claim, but with a new activation for the new job.

These charts are really nice to show how one can build a robust system from a number of disparately connected parts.

Thanks, and good luck with your talk,

-greg

Follow-Ups:
- Re: [HTCondor-users] HTCondor diagram of daemons
  - From: West, Matthew

References:
- [HTCondor-users] HTCondor diagram of daemons
  - From: West, Matthew
- Re: [HTCondor-users] HTCondor diagram of daemons
  - From: Greg Thain
- Re: [HTCondor-users] HTCondor diagram of daemons
  - From: West, Matthew

Prev by Date: Re: [HTCondor-users] HTCondor diagram of daemons
Next by Date: Re: [HTCondor-users] Problems Defining Additional Slot Types
Previous by thread: Re: [HTCondor-users] HTCondor diagram of daemons
Next by thread: Re: [HTCondor-users] HTCondor diagram of daemons
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] HTCondor diagram of daemons