[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Nodes cannot start condor



Hi,

On Tue, Jul 13, 2010 at 11:45 AM, Mag Gam <magawake@xxxxxxxxx> wrote:
> as the condor user, can you
> cd /var/log/condor && touch hi

Thanks for this. I also had a reply off-list and it was suggested I
look at dmesg. I got the following output:
..
condor_exec.467[28638]: segfault at 0000000000000010 rip
0000000000814de7 rsp 00007fffffff3850 error 6
lalapps_BankEff[25683]: segfault at 0000000000000078 rip
0000000000442831 rsp 00007fffbb4e3a10 error 4
lalapps_BankEff[25698]: segfault at 0000000000000078 rip
0000000000442831 rsp 00007fff1253e640 error 4
EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks not in
datazone - block = 931161345, count = 1
Aborting journal on device sda1.
ext3_abort called.
EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks not in
datazone - block = 847167968, count = 1
EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks not in
datazone - block = 1288987090, count = 1
EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks not in
datazone - block = 1074310421, count = 1
EXT3-fs error (device sda1): ext3_free_blocks: Freeing blocks not in
..

which suggested something weird and I decided to rebuild the node.

Thanks,
David









> On Tue, Jul 13, 2010 at 3:37 AM, David McKechan
> <david.mckechan@xxxxxxxxxxxxxx> wrote:
>> Hi,
>>
>> I've had this incident twice in the last week on separate nodes. The
>> node drops out of the condor pool and then I cannot start condor on
>> it. I get the following error:
>>
>> [node3 ~]$ /etc/init.d/condor start
>> Starting up Condor...Can't open "/var/log/condor/MasterLog"
>> dprintf() had a fatal error in pid 27088
>> Can't open "/var/log/condor/MasterLog"
>> errno: 30 (Read-only file system)
>> euid: 106119, ruid: 106119
>> done.
>>
>> Although the file should be readable by condor:
>> [node3 ~]$ ls -hlrt /var/log/condor/
>> total 36M
>> -rw-r--r-- 1 condor condor 1.8K Jul  5 15:09 StarterLog
>> -rw-r--r-- 1 condor condor 9.6M Jul 11 06:59 StartLog.old
>> -rw-r--r-- 1 condor condor 5.2M Jul 12 00:03 StarterLog.boinc
>> -rw------- 1 condor condor    0 Jul 12 23:09 InstanceLock
>> -rw-r--r-- 1 condor condor 148K Jul 13 05:48 MasterLog
>> -rw-r--r-- 1 condor condor 5.3M Jul 13 06:00 StarterLog.slot2
>> -rw-r--r-- 1 condor condor 8.5M Jul 13 06:03 StarterLog.slot1
>> prw------- 1 condor condor    0 Jul 13 06:03 procd_pipe.STARTD.watchdog
>> -rw-r--r-- 1 condor condor 5.3M Jul 13 06:03 StartLog
>> prw------- 1 condor condor    0 Jul 13 06:03 procd_pipe.STARTD
>> -rw-r--r-- 1 condor condor 1.3M Jul 13 06:03 CkptServerLog
>>
>> Can anyone help me?
>>
>> Thanks,
>> David
>> --
>> Help me raise money for Alzheimer Scotland - http://www.waitup.org.uk
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>



-- 
Help me raise money for Alzheimer Scotland - http://www.waitup.org.uk