[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Scheduler segmentation fault?



Thanks. This is what I have in my ScheddLog and I didn't have core file.

1/31 11:15:40 (pid:9405) Called reschedule_negotiator()
1/31 11:15:42 (pid:9405) Inserting new attribute Scheduler into 
non-active cluster cid=900 acid=-1
1/31 11:15:52 (pid:16391) 
******************************************************
1/31 11:15:52 (pid:16391) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
1/31 11:15:52 (pid:16391) ** /usr/local/condor/sbin/condor_schedd
1/31 11:15:52 (pid:16391) ** $CondorVersion: 6.8.2 Oct 12 2006 $
1/31 11:15:52 (pid:16391) ** $CondorPlatform: X86_64-LINUX_RHEL3 $
1/31 11:15:52 (pid:16391) ** PID = 16391
1/31 11:15:52 (pid:16391) ** Log last touched 1/31 11:15:42
1/31 11:15:52 (pid:16391) 
******************************************************

Around that time, I ran "condor_userprio -setprio user value" as super 
user, and that user submitted job cluster 900. In addition, 
condor_userprio wasn't able to set the priority although the return 
message said it did.


Junjun 

On Wednesday 31 January 2007 12:24, Steven Timm wrote:
> should be a core dump in $LOCAL_DIR/log/core if you are
> fast enough to get it out of there.  condor_preen cleans it
> up pretty quickly (or possibly $LOCAL_DIR/spool/core).
> Also there should be more information in SchedLog as to why the
> core dump happened especially if you have the right debug
> settings D_FULLDEBUG
>
> Steve
>
>
> It's unlikely that a running job fault would kill the schedd.
>
> On Wed, 31 Jan 2007, Junjun Mao wrote:
> > Hi, I got a segmentation fault
> >
> > 1/31 11:15:42 The SCHEDD (pid 9405) died due to signal 11
> > 1/31 11:15:42 Sending obituary
> > for "/usr/local/condor/sbin/condor_schedd"
> > 1/31 11:15:42 restarting /usr/local/condor/sbin/condor_schedd in 10
> > seconds
> > 1/31 11:15:52 Started DaemonCore
> > process "/usr/local/condor/sbin/condor_schedd", pid and pgroup =
> > 16391
> >
> >
> > Is there any place like core file or log file that I can dig out
> > more information from?
> >
> > Was this caused by my jobs?
> >
> > Junjun
> >
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> > with a subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at either
> > https://lists.cs.wisc.edu/archive/condor-users/
> > http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR