[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] scheduling problem?



How big is your history file, in the condor spool directory?
If that gets past two gigs, the 6.6.x schedd can die with
signal 25.

It was 2.0G, I deleted it and then touched it to recreate it (I didn't know if there had to be a file) but still no joy.

However, I've since had to shutdown the PC and now it won't boot up so maybe this wasn't a condor problem just a symptom of a system error. I'll get back to you if I manage to get the system back and the condor error persists.

Thanks for your time.


The 6.7.19 schedd rotates the history file to prevent this.

If it's not the history file, please send the ScheddLog file.

-Erik

On Thu, May 25, 2006 at 07:31:23AM +0000, John Coulthard wrote:
> Thanks for the replies.
>
> >I'm guessing that you're running on a BSD-derived operating system (eg
> >MacOS X.)  Signal 25 on BSD 4.2 machines is described as follows:
>
> >SIGXFSZ     25,25,31    Core    File size limit exceeded (4.2 BSD)
>
> It's a Linux OS. Is "Signal 25" on linux still a file size issue?
> I'm not sure which files the SCHEDD normally uses but there are no +1G files
> on the system that are attributable to condor/root.
>
> I notice in the MasterLog that on starting condor the STARTD is not started. > Should it be? Could this be part of the problem? I can start it manually
> and that doesn't solve the issue so probably not.
>
> MasterLog...
> 5/25 06:10:06 Using config file: /users/condor/condor/etc/condor_config
> 5/25 06:10:06 Using local config files:
> /users/condor/hosts/galaxy/condor_config.local
> 5/25 06:10:06 DaemonCore: Command Socket at <192.168.0.40:57782>
> 5/25 06:10:06 Started DaemonCore process
> "/home/condor/condor/sbin/condor_collector", pid and pgroup = 8508
> 5/25 06:10:06 Started DaemonCore process
> "/home/condor/condor/sbin/condor_negotiator", pid and pgroup = 8509
> 5/25 06:10:06 Started DaemonCore process
> "/home/condor/condor/sbin/condor_schedd", pid and pgroup = 8510
> 5/25 06:15:22 The SCHEDD (pid 8510) died due to signal 25
>
>
> Still grateful for any help you can give.
>
> _________________________________________________________________
> Express yourself instantly with MSN Messenger! Download today it's FREE!
> http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/