[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Scheduler segmentation fault?



Since you are running on X86_64, if you want to have a core dump
image you need to make sure the kernel suid_dumpable setting in the
kernel has a value of 1.

On our FC4 machines we accomplish this with the following line added
to /etc/sysctl.conf to enforce the setting at boot time,

# Allows condor to create core dumps
fs.suid_dumpable = 1

though you can change this no the fly with,

echo 1 > /proc/sys/fs/suid_dumpable

Note, the actual location of this file in /proc is distribution dependant.



On Wed, Jan 31, 2007 at 11:51:20AM -0600, Steven Timm wrote:
> On Wed, 31 Jan 2007, Junjun Mao wrote:
> 
> > Thanks. This is what I have in my ScheddLog and I didn't have core file.
> >
> > 1/31 11:15:40 (pid:9405) Called reschedule_negotiator()
> > 1/31 11:15:42 (pid:9405) Inserting new attribute Scheduler into
> > non-active cluster cid=900 acid=-1
> > 1/31 11:15:52 (pid:16391)
> > ******************************************************
> > 1/31 11:15:52 (pid:16391) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
> > 1/31 11:15:52 (pid:16391) ** /usr/local/condor/sbin/condor_schedd
> > 1/31 11:15:52 (pid:16391) ** $CondorVersion: 6.8.2 Oct 12 2006 $
> > 1/31 11:15:52 (pid:16391) ** $CondorPlatform: X86_64-LINUX_RHEL3 $
> > 1/31 11:15:52 (pid:16391) ** PID = 16391
> > 1/31 11:15:52 (pid:16391) ** Log last touched 1/31 11:15:42
> > 1/31 11:15:52 (pid:16391)
> > ******************************************************
> >
> > Around that time, I ran "condor_userprio -setprio user value" as super
> > user, and that user submitted job cluster 900. In addition,
> > condor_userprio wasn't able to set the priority although the return
> > message said it did.
> >
> >
> > Junjun
> 
> change the SCHEDD_DEBUG setting in condor_config
> to be D_FULLDEBUG and do a condor_reconfig, that way
> you will get more info next time.
> condor_userprio -setprio
> only applies to a job that users already have in the queue.
> If you want to permanently change the priority for a user
> you need condor_userprio -setfactor
> 
> Steve
> 
> 
> 
> >
> > On Wednesday 31 January 2007 12:24, Steven Timm wrote:
> >> should be a core dump in $LOCAL_DIR/log/core if you are
> >> fast enough to get it out of there.  condor_preen cleans it
> >> up pretty quickly (or possibly $LOCAL_DIR/spool/core).
> >> Also there should be more information in SchedLog as to why the
> >> core dump happened especially if you have the right debug
> >> settings D_FULLDEBUG
> >>
> >> Steve
> >>
> >>
> >> It's unlikely that a running job fault would kill the schedd.
> >>
> >> On Wed, 31 Jan 2007, Junjun Mao wrote:
> >>> Hi, I got a segmentation fault
> >>>
> >>> 1/31 11:15:42 The SCHEDD (pid 9405) died due to signal 11
> >>> 1/31 11:15:42 Sending obituary
> >>> for "/usr/local/condor/sbin/condor_schedd"
> >>> 1/31 11:15:42 restarting /usr/local/condor/sbin/condor_schedd in 10
> >>> seconds
> >>> 1/31 11:15:52 Started DaemonCore
> >>> process "/usr/local/condor/sbin/condor_schedd", pid and pgroup =
> >>> 16391
> >>>
> >>>
> >>> Is there any place like core file or log file that I can dig out
> >>> more information from?
> >>>
> >>> Was this caused by my jobs?
> >>>
> >>> Junjun
> >>>
> >>> _______________________________________________
> >>> Condor-users mailing list
> >>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> >>> with a subject: Unsubscribe
> >>> You can also unsubscribe by visiting
> >>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >>>
> >>> The archives can be found at either
> >>> https://lists.cs.wisc.edu/archive/condor-users/
> >>> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at either
> > https://lists.cs.wisc.edu/archive/condor-users/
> > http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
> >
> 
> -- 
> ------------------------------------------------------------------
> Steven C. Timm, Ph.D  (630) 840-8525
> timm@xxxxxxxx  http://home.fnal.gov/~timm/
> Fermilab Computing Division, Scientific Computing Facilities,
> Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR

-- 
Stuart Anderson  anderson@xxxxxxxxxxxxxxxx
http://www.ligo.caltech.edu/~anderson