[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Socket between submit and execute hosts closed unexpectedly



Hi,

When the socket between the submit and execute hosts are terminated and they can't reconnect which attributes specify how many times the connection is retried and how long is the delay between the tests?
Is it JobLeaseDuration and MAX_CLAIM_ALIVES_MISSED?

Cheers,
Szabolcs


--
0114:  022 (18804064.000.000) 05/23 15:20:41 Job disconnected, attempting to reconnect
0115:      Socket between submit and execute hosts closed unexpectedly
0116:      Trying to reconnect to ...
0117:  ...
0118:  024 (18804064.000.000) 05/23 15:40:41 Job reconnection failed
0119:      Job disconnected too long: JobLeaseDuration (1200 seconds) expired
0120:      Can not reconnect to ..., rescheduling job
0121:  ...
--