[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Execution machines stop the job…

We're using Condor to execute jobs which take a lot of time on 15 macintosh G5. After few hours, all the execution machines stop the job, a communication error occurs between the condor_starter and the condor_master (macintosh Xserve):

Cluster01 crashdump: Unable to determine CPSProcessSerNum pid: 11913 name: condor_starter

and in the Shadow log, we have:
ERROR "Can no longer talk to condor_starter on execute machine (" at line 63 in file NTreceivers.C

Problem exists with condor6.6.6 and condor6.6.7…

Thank you for your help


Damien AUTRET:

Unité INSERM 601
Département de Recherche en ImmunoCancérologie
Equipe 6 Biophysique-Cancérologie
9 Quai Moncousu
44093 Nantes Cedex