[HTCondor-devel] Delayed Transfer with Signals(self-checkpoint) and checkpoint_exit_code


Date: Mon, 18 Jul 2022 17:58:02 +0900
From: Geonmo Ryu <geonmo@xxxxxxxxxxx>
Subject: [HTCondor-devel] Delayed Transfer with Signals(self-checkpoint) and checkpoint_exit_code

Dear HTCondor Developers,


I am conducting a test to introduce HTCondor's self-checkpointing applications on our site.


By the way, I have a question. 


Delayed Transfer with Signals (as described in the HTCondor manual) is the part where the transfer_checkpoint_files keyword is not available.

(https://htcondor.readthedocs.io/en/latest/users-manual/self-checkpointing-applications.html#delayed-transfer-with-signals)


When I looked at the source code, I knew that even if a checkpoint exit code occurs but it is soft killing, it is ignored and terminated.


I would like to ask if these codes were written with intent.


Of course, timeout may cause problems that cannot be transmitted due to insufficient transmission time, but I was wondering if it would be more advantageous to try transmission.


I checked that it is possible to simply activate transfer_checkpoint_files function.(when_to_transfer_output = ON_EXIT + exit code 85 + SIGUSR2 signal)


Please let me know if there is a problem you are expecting. If this issue is resolved, it will be quite useful on our site.


Regards,


-- Geonmo




[← Prev in Thread] Current Thread [Next in Thread→]