[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Assertion ERROR on (result)" at line 655 in file pseudo_ops.cpp



On 06/03/2010 12:27 PM, Gabriele Foerstner wrote:
> We changed our condor version from 6.8.6 to 7.4.1 and now our condor
> jobs get very often the following error messages written in the log file:
> 
> 007 (016.022.000) 06/03 16:05:17 Shadow exception!
>         Error from slot3@xxxxxxxxxxxxxxx: Assertion ERROR on (result)
>         0  -  Run Bytes Sent By Job
> 
> The message may even show up several times for the same process.
> 
> The ShadowLog shows the following Error:
> 06/03 16:26:17 (16.39) (18909): ERROR "Error from slot4@xxxxxxxxxxxxxxx:
> Assertion ERROR on (result)" at line 655 in file pseudo_ops.cpp
> 
> The processes get rescheduled, as as they might get even rescheduled
> several times, the execution time of a job cluster is more than twice as
> long.
> 
> Version 6.8.6 was running on RedhatE4 (64-bit AMD hosts), and 7.4.1 is
> running on RedhatE5 (64-bit AMD hosts).
> I tested as well version 7.4.2, but it didn't help.
> 
> We run only vanilla jobs.
> As most of our jobs are very I/O intensive, so we use NFS to avoid
> network traffic (USE_NFS = True).
> 
> 
> Thanks for your help
>    Gabriele

Not helpful wrt your error, just at note on USE_NFS because I see so many people referencing it -

 According to the manual, it is primarily for the Standard Universe,

  http://www.cs.wisc.edu/condor/manual/v7.5/3_3Configuration.html#15625

That said, a quick spin through the code suggests that USE_NFS is used by chirp in the Vanilla Universe and probably Parallel Universe. The manual could probably use with some updating, and the param could use a more specific name.

Best,


matt