[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] core.MASTER.WIN32 and core.CRED.WIN32




While trying to figure this out I am noticing a couple things. First, my cred service is dying on the central manager, which throws the core.CRED.WIN32 file. If I delete this file the service will generally restart, but sometimes I have to restart the Condor service to get the cred service to start again.

I am also noticing that on my submit machine a core.STARTD.WIN32 file is created and this might be related to why jobs are remaining in idle.

However, I do not know what any of this means. The load average on the CM is on average 30%, with spikes as high as 70%. This seems a little high since we are not running any other services on the server. The collector is usually at about 25% and the spikes are caused from the other Condor services (mainly the negotiator).

My search on google for access violation to C:\Windows\system32\ntdll.dll and memory problems are plentiful, but because they vary and because we were not having problems before I am not making a lot of progress trying to figure this out. It does seem like these files are related to the inability of jobs to match when in fact I know that machines are available.

thanks,
mike



From: "Michael O'Donnell" <odonnellm@xxxxxxxx>
To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Date: 02/09/2011 03:41 PM
Subject: [Condor-users] core.MASTER.WIN32 and core.CRED.WIN32
Sent by: condor-users-bounces@xxxxxxxxxxx






I have noticed on our central manager that two files are created. These files include:

core.MASTER.WIN32 and core.CRED.WIN32



The header content of the files include:

PID: 660

Exception code: C0000005 ACCESS_VIOLATION

Fault address:  77427F1A 01:00066F1A C:\Windows\system32\ntdll.dll



If I delete the files they are re-created, and I do not recall seeing the files in the past. Does anyone know what this access violation is about. Could there be a problem with antivirus or something. Our pool is functioning with the exception that all jobs remain in idle, which started after expanding our pool from 100 cores to 200 cores (posted earlier today--[Condor-users] Job remains in idle (worked until I increased pool size). I don't think this is related, but I am trying to troubleshoot this.


Thank you for your help,

Mike
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/