[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] 32bit checkpoint servers and 64bit nodes?



Thanks, Dan.  I guess it's time to add another checkpoint server.

- dave


Dan Bradley wrote:
There is indeed an incompatibility when checkpoint servers and the condor_shadow when one is 32-bit and the other 64-bit. We just recently became aware of this. I'll leave it to those working on a solution to report when that might be ready.

--Dan

David A. Kotz wrote:
I've recently converted a part of my Linux Condor pool to 64bit Ubuntu and 64bit Condor. Both of my initial 64bit users have reported that their jobs are failing to checkpoint and continually restarting. All of the checkpoint servers in my pool are running 32bit Ubuntu and 32bit Condor. Is there any known issue with this configuration? I'd assumed that the checkpoint server just received a data stream and dumped it to disk so that it wouldn't matter. The checkpoint servers in question are doing checkpoints for other jobs.

The shadow log for the 64bit submit node has lines like these:

2/18 22:35:10 (765.0) (22524):store request to ckpt server failed, trying again in 320 seconds

- dave
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/