[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] 8.0 scheduler crash



More info. It happened a few minutes ago.

Fortunately, I got this file


$ cat dprintf_failure.SCHEDD dprintf() had a fatal error in pid 2716 Error writing debug log errno: 27 (File too large) euid: 37002, ruid: 0


After I shortened the file and restarted the scheduler I am seeings it properties.


They look ok to me. Anyone see anything wrong here?


for i in `pgrep -u condor condor_schedd`; do echo "pid: $i"; cat /proc/$i/limits ; done


pid: 6475

Limit                     Soft Limit           Hard Limit           Units

Max cpu time              unlimited            unlimited            seconds

Max file size             unlimited            unlimited            bytes

Max data size             unlimited            unlimited            bytes

Max stack size            10485760             unlimited            bytes

Max core file size        unlimited            unlimited            bytes

Max resident set          unlimited            unlimited            bytes

Max processes             137215               256698               processes

Max open files            4096                 131072               files

Max locked memory         65536                65536                bytes

Max address space         unlimited            unlimited            bytes

Max file locks            unlimited            unlimited            locks

Max pending signals       256698               256698               signals

Max msgqueue size         819200               819200               bytes

Max nice priority         0                    0

Max realtime priority     0                    0

Max realtime timeout      unlimited            unlimited            us

pid: 25338

Limit                     Soft Limit           Hard Limit           Units

Max cpu time              unlimited            unlimited            seconds

Max file size             unlimited            unlimited            bytes

Max data size             unlimited            unlimited            bytes

Max stack size            10485760             unlimited            bytes

Max core file size        unlimited            unlimited            bytes

Max resident set          unlimited            unlimited            bytes

Max processes             137215               256698               processes

Max open files            4096                 131072               files

Max locked memory         65536                65536                bytes

Max address space         unlimited            unlimited            bytes

Max file locks            unlimited            unlimited            locks

Max pending signals       256698               256698               signals

Max msgqueue size         819200               819200               bytes

Max nice priority         0                    0

Max realtime priority     0                    0

Max realtime timeout      unlimited            unlimited            us




On Wed, Feb 5, 2014 at 5:17 PM, Rita <rmorgan466@xxxxxxxxx> wrote:
No definitely not. There is ample disk space.

condor_version is, 8.1.1 Sep 11 2013 BuildID: 171174 BTW




On Wed, Feb 5, 2014 at 4:56 PM, Greg Thain <gthain@xxxxxxxxxxx> wrote:
On 02/05/2014 03:52 PM, Rita wrote:
It seems the 8.0 scheduler is crashing (condor_schedd dies).

I nailed it down to the SchedLog.

basically, if I recreate the SchedLog ( > ScheddLog) and restart all condor processes everything resumes again.


Is it possible the disk the log is on is full?  Condor daemons will refuse to run if the log disk partition can't be written to.

-Greg

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
--- Get your facts first, then you can distort them as you please.--



--
--- Get your facts first, then you can distort them as you please.--