[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor diagram of daemons

On 4/4/22 12:21, West, Matthew wrote:

Thanks Greg for finding the diagrams (slides 23-32) from T.T. I was looking for. The notes on slides 28-30 of the PPTX version go into finer sequential detail about the steps in Claim acquisition. Some additional questions:

An important part of the design is that we want the system as a whole to be able to support more jobs than any one scheduler process can hold. (Even though there are many sites with just one schedd). So, when the schedd wants to tell the collector that it needs matches from the negotiator, it can't just upload all the jobs to the collector, as there might be way too many. Instead, it sends "submitter records", which condense the requests into a single classad record per submitter, with the number of requests each submitter in that schedd has. If you are curious, these are visible with the "condor_status -submitter" command. I'm not sure how we got "Q" out of that.
Yes. Once the schedd has been given a slot to use from the negotiator, the schedd "claim"s the slot, for exclusive (but time-limited) use by that schedd. Assuming that succeeds, the starter "activate"s the claim to run a single job, which causes the startd to create the starter.
File transfer is handled by the shadow and the starter. Input xfer happens right after activation, and Output after the job completes, but the claim is still active during file xfer.

Once the first job on a claim completes, if the amount of time it took is less than CLAIM_WORKLIFE, and the schedd can find another job that fits in the slot, it is free to launch another starter to reuse the existing claim, but with a new activation for the new job.

These charts are really nice to show how one can build a robust system from a number of disparately connected parts.

Thanks, and good luck with your talk,