[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Assertion ERROR on (result)" at line 655 in file pseudo_ops.cpp



We changed our condor version from 6.8.6 to 7.4.1 and now our condor jobs get very often the following error messages written in the log file:

007 (016.022.000) 06/03 16:05:17 Shadow exception!
        Error from slot3@xxxxxxxxxxxxxxx: Assertion ERROR on (result)
        0  -  Run Bytes Sent By Job

The message may even show up several times for the same process.

The ShadowLog shows the following Error:
06/03 16:26:17 (16.39) (18909): ERROR "Error from slot4@xxxxxxxxxxxxxxx: Assertion ERROR on (result)" at line 655 in file pseudo_ops.cpp

The processes get rescheduled, as as they might get even rescheduled several times, the execution time of a job cluster is more than twice as long.

Version 6.8.6 was running on RedhatE4 (64-bit AMD hosts), and 7.4.1 is running on RedhatE5 (64-bit AMD hosts).
I tested as well version 7.4.2, but it didn't help.

We run only vanilla jobs.
As most of our jobs are very I/O intensive, so we use NFS to avoid network traffic (USE_NFS = True).


Thanks for your help
   Gabriele
--
----------------------------------------------------------------------O
        Gabriele Förstner   email: forstner@xxxxxxx
      European Synchrotron
	Radiation Facility   Tel: +33 - (0)4.76.88.24.52
                    BP 220   FAX: +33 - (0)4.76.88.24.27
    F-38043 Grenoble CEDEX   "Der Weg ist das Ziel"