[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problem with schedd ad ?



Hi,


My problem of badly form schedd classad had appened another time this
night.

I have done the following :
- reconfig the schedd : this had done nothing (the last time it worked)
except that there was another unatended job in the queue
- stop the schedd
- restart the schedd : the schedd refused to start with the following
error :
1/12 15:34:10 ******************************************************
1/12 15:34:10 ** condor_schedd (CONDOR_SCHEDD) STARTING UP
1/12 15:34:10 ** /usr/local/condor/sbin/condor_schedd
1/12 15:34:10 ** $CondorVersion: 6.7.10 Aug  3 2005 $
1/12 15:34:10 ** $CondorPlatform: I386-LINUX_RH9 $
1/12 15:34:10 ** PID = 8770
1/12 15:34:10 ******************************************************
1/12 15:34:10 Using config file: /usr/local/condor/etc/condor_config
1/12 15:34:10 Using local config
files: /usr/local/condor/etc/nasca.local
1/12 15:34:10 DaemonCore: Command Socket at <10.5.129.14:8830>
1/12 15:34:10 ERROR "Error: bad record with op=101 in corrupt logfile"
at line 723 in file classad_log.C

- I removed the bad class ad :
101 4286.1 Submitter 
103 4286.1 CondorVersion "$CondorVersion: 6.7.10 Aug  3 2005 $"
103 4286.1 CondorPlatform "$CondorPlatform: I386-LINUX_RH9 $"
103 4286.1 Machine "nasca"
103 4286.1 ScheddIpAddr "<10.5.129.14:8416>"
103 4286.1 MyAddress "<10.5.129.14:8416>"
103 4286.1 MaxJobsRunning 100
103 4286.1 VirtualMemory 1951856
103 4286.1 MonitorSelfTime 1137015964
103 4286.1 MonitorSelfCPUUsage 0.049883
103 4286.1 MonitorSelfImageSize 24024.000000
103 4286.1 MonitorSelfResidentSetSize 17844
103 4286.1 MonitorSelfAge 1853456
103 4286.1 WantResAd TRUE
103 4286.1 ScheddName "nasca"
103 4286.1 HeldJobs 0
103 4286.1 FlockedJobs 0
103 4286.1 Name "laura@nasca"
103 4286.1 DaemonStartTime 1136506854
103 4286.1 UpdateSequenceNumber 2205
103 4286.1 RunningJobs 0
103 4286.1 IdleJobs 0

- and restart successfully the schedd

Do you think there is a better thing to do if the bug occure ?

This bug is the real problem for me because our pool had only one submit
host. So if I could help you to solve this bug, I would do it.

Jean-Christophe Baccon

-- 
Jean-Christophe Baccon
Service Informatique Recherche
Université de Cergy-Pontoise
01 34 25 70 69
http://www.sir.u-cergy.fr