[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Scheduler segmentation fault?



On Wed, 31 Jan 2007, Junjun Mao wrote:

Thanks. This is what I have in my ScheddLog and I didn't have core file.

1/31 11:15:40 (pid:9405) Called reschedule_negotiator()
1/31 11:15:42 (pid:9405) Inserting new attribute Scheduler into
non-active cluster cid=900 acid=-1
1/31 11:15:52 (pid:16391)
******************************************************
1/31 11:15:52 (pid:16391) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
1/31 11:15:52 (pid:16391) ** /usr/local/condor/sbin/condor_schedd
1/31 11:15:52 (pid:16391) ** $CondorVersion: 6.8.2 Oct 12 2006 $
1/31 11:15:52 (pid:16391) ** $CondorPlatform: X86_64-LINUX_RHEL3 $
1/31 11:15:52 (pid:16391) ** PID = 16391
1/31 11:15:52 (pid:16391) ** Log last touched 1/31 11:15:42
1/31 11:15:52 (pid:16391)
******************************************************

Around that time, I ran "condor_userprio -setprio user value" as super
user, and that user submitted job cluster 900. In addition,
condor_userprio wasn't able to set the priority although the return
message said it did.


Junjun

change the SCHEDD_DEBUG setting in condor_config
to be D_FULLDEBUG and do a condor_reconfig, that way
you will get more info next time.
condor_userprio -setprio
only applies to a job that users already have in the queue.
If you want to permanently change the priority for a user
you need condor_userprio -setfactor

Steve




On Wednesday 31 January 2007 12:24, Steven Timm wrote:
should be a core dump in $LOCAL_DIR/log/core if you are
fast enough to get it out of there.  condor_preen cleans it
up pretty quickly (or possibly $LOCAL_DIR/spool/core).
Also there should be more information in SchedLog as to why the
core dump happened especially if you have the right debug
settings D_FULLDEBUG

Steve


It's unlikely that a running job fault would kill the schedd.

On Wed, 31 Jan 2007, Junjun Mao wrote:
Hi, I got a segmentation fault

1/31 11:15:42 The SCHEDD (pid 9405) died due to signal 11
1/31 11:15:42 Sending obituary
for "/usr/local/condor/sbin/condor_schedd"
1/31 11:15:42 restarting /usr/local/condor/sbin/condor_schedd in 10
seconds
1/31 11:15:52 Started DaemonCore
process "/usr/local/condor/sbin/condor_schedd", pid and pgroup =
16391


Is there any place like core file or log file that I can dig out
more information from?

Was this caused by my jobs?

Junjun

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.