[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] win32 collector failures



The error message occurs on every hashent insert. Schedd, Start, StartdPvt, everything. I also get
ERROR: DC_AUTHENTICATE unable to receive auth_info!
quite frequently. It seems like ads just plain don't work here.


All of our machines are 6.6.7, exec and cm, and I can't seem to get an earlier copy to check against. I think I have a 6.7.1 that runs smoothly over here, and i'll try that next, but let me know if you find something that fixes that buffer error.

-George

David Vestal wrote:

George,

Do these error message clusters in your CollectorLog tend to be at roughly 15-second intervals?  Do your execute machines' MasterLogs contain something like this:
"Can't send UPDATE_MASTER_AD to collector <192.168.33.4:9618>: Failed to send UDP update command to collector"

If so, then I think we're seeing the same problem. I upgraded our WinXP CM to 6.6.7 recently, and our Collector has had these problems ever since. It looks like the CM is refusing connections to the execute machines when they send MasterAds. I don't know why yet.

Are your execute machines all 6.6.7 as well?

Regards,
-David

-----Original Message-----
From: George Gensure [mailto:werkt@xxxxxxxxxxx]
Sent: Wednesday, January 05, 2005 3:29 PM
To: Condor-Users Mail List
Subject: [SPAM] - [Condor-users] win32 collector failures - Email found
in subject


On my win2k server CM, condor_status returns nothing, and in the collector log I see a bunch of somewhat disturbing messages. I'm running version 6.6.7, freshly installed, and condor has worked on this machine before.


CollectorLog
1/5 15:24:31 stats: Inserting new hashent for 'Schedd':'hmel-it1-dc.pghschool.loc':'10.142.1.6'
1/5 15:24:31 condor_write(): Socket closed when trying to write buffer
1/5 15:24:31 Buf::write(): condor_write() failed
1/5 15:24:31 SECMAN: Error sending response classad!


MasterLog
1/5 15:23:30 Started DaemonCore process "C:\Condor/bin/condor_collector.exe", pid and pgroup = 1012
1/5 15:23:30 Started DaemonCore process "C:\Condor/bin/condor_negotiator.exe", pid and pgroup = 1664
1/5 15:23:30 Started DaemonCore process "C:\Condor/bin/condor_startd.exe", pid and pgroup = 2688
1/5 15:23:30 Started DaemonCore process "C:\Condor/bin/condor_schedd.exe", pid and pgroup = 900
1/5 15:23:55 condor_read(): timeout reading buffer.


-George
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users