[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor diagram of daemons



Because I will be talking to RSEʼs who might be skeptical that the extra process steps have tangible benefits, Iʼd like to be able to explain some of the robustness features enabled by this design.

...

The more important property to maintain is that, in the presence
of errors and crashes, that we leave the machine in a state where we can
continue to operate after a reboot or restart.

From the perspective of someone submitting jobs, HTCondor takes the position that robustness means that the submitter doesn't have to take any particular action because of a hardware or software failure; HTCondor won't forget the job and will (eventually) run it again.

- ToddM