[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Execution machines stop the job…



Hi,
We're using Condor to execute jobs which take a lot of time on 15 macintosh G5. After few hours, all the execution machines stop the job, a communication error occurs between the condor_starter and the condor_master (macintosh Xserve):

Cluster01 crashdump: Unable to determine CPSProcessSerNum pid: 11913 name: condor_starter

and in the Shadow log, we have:
ERROR "Can no longer talk to condor_starter on execute machine (192.168.1.23)" at line 63 in file NTreceivers.C

Problem exists with condor6.6.6 and condor6.6.7…

Thank you for your help

Damien

Damien AUTRET:

Unité INSERM 601
Département de Recherche en ImmunoCancérologie
Equipe 6 Biophysique-Cancérologie
9 Quai Moncousu
44093 Nantes Cedex
Tél: 02.40.41.28.21
Fax: 02.40.35.66.97
Sec: 02.40.08.47.47