[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] HTCondor diagram of daemons
- Date: Wed, 6 Apr 2022 13:26:20 -0500 (CDT)
- From: Todd L Miller <tlmiller@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] HTCondor diagram of daemons
Because I will be talking to RSEʼs who might be skeptical that the extra
process steps have tangible benefits, Iʼd like to be able to explain some
of the robustness features enabled by this design.
The more important property to maintain is that, in the presence
of errors and crashes, that we leave the machine in a state where we can
continue to operate after a reboot or restart.
From the perspective of someone submitting jobs, HTCondor takes
the position that robustness means that the submitter doesn't have to take
any particular action because of a hardware or software failure; HTCondor
won't forget the job and will (eventually) run it again.