[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] win32 collector failures



George:

We have had no other reports of this problem with the v6.6.7 Win32
release.  Perhaps you have a configuration issue.

You might consider enabling D_FULLDEBUG and D_SECURITY log debugging on
both sides (client and server) of this error message, and comparing the
timestamped entries.

If you are still unable to resolve this issue, you can send email to
condor-admin@xxxxxxxxxxxx  Be sure to make the above verbose log files
available.

	Jeff
	Condor Team

On Thu, 2005-01-06 at 10:09, George Gensure wrote:
> The error message occurs on every hashent insert.  Schedd, Start, 
> StartdPvt, everything.  I also get
> ERROR: DC_AUTHENTICATE unable to receive auth_info!
> quite frequently.  It seems like ads just plain don't work here.
> 
> All of our machines are 6.6.7, exec and cm, and I can't seem to get an 
> earlier copy to check against.  I think I have a 6.7.1 that runs 
> smoothly over here, and i'll try that next, but let me know if you find 
> something that fixes that buffer error.
> 
> -George
> 
> David Vestal wrote:
> 
> >George,
> >
> >Do these error message clusters in your CollectorLog tend to be at roughly 15-second intervals?  Do your execute machines' MasterLogs contain something like this:
> >"Can't send UPDATE_MASTER_AD to collector <192.168.33.4:9618>: Failed to send UDP update command to collector"
> >
> >If so, then I think we're seeing the same problem.  I upgraded our WinXP CM to 6.6.7 recently, and our Collector has had these problems ever since.  It looks like the CM is refusing connections to the execute machines when they send MasterAds.  I don't know why yet.
> >
> >Are your execute machines all 6.6.7 as well?
> >
> >Regards,
> >-David
> >
> >-----Original Message-----
> >From: George Gensure [mailto:werkt@xxxxxxxxxxx]
> >Sent: Wednesday, January 05, 2005 3:29 PM
> >To: Condor-Users Mail List
> >Subject: [SPAM] - [Condor-users] win32 collector failures - Email found
> >in subject
> >
> >
> >On my win2k server CM, condor_status returns nothing, and in the 
> >collector log I see a bunch of somewhat disturbing messages.  I'm 
> >running version 6.6.7, freshly installed, and condor has worked on this 
> >machine before.
> >
> >CollectorLog
> >1/5 15:24:31 stats: Inserting new hashent for 
> >'Schedd':'hmel-it1-dc.pghschool.loc':'10.142.1.6'
> >1/5 15:24:31 condor_write(): Socket closed when trying to write buffer
> >1/5 15:24:31 Buf::write(): condor_write() failed
> >1/5 15:24:31 SECMAN: Error sending response classad!
> >
> >MasterLog
> >1/5 15:23:30 Started DaemonCore process 
> >"C:\Condor/bin/condor_collector.exe", pid and pgroup = 1012
> >1/5 15:23:30 Started DaemonCore process 
> >"C:\Condor/bin/condor_negotiator.exe", pid and pgroup = 1664
> >1/5 15:23:30 Started DaemonCore process 
> >"C:\Condor/bin/condor_startd.exe", pid and pgroup = 2688
> >1/5 15:23:30 Started DaemonCore process 
> >"C:\Condor/bin/condor_schedd.exe", pid and pgroup = 900
> >1/5 15:23:55 condor_read(): timeout reading buffer.
> >
> >-George