[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Dedicated scheduling broken after auth changes



Dear Steve Huston,

	I met the same problem.

	The authentication settings in my condor_config file is:

SEC_CLIENT_AUTHENTICATION = REQUIRED
SEC_DEFAULT_AUTHENTICATION = OPTIONAL
SEC_DEFAULT_AUTHENTICATION_METHODS = CLAIMTOBE
SEC_CLIENT_AUTHENTICATION_METHODS = CLAIMTOBE


  The problem happened  when  Negotiator  sent  startd ClassAD to Schedd.

   In Schedd Log
4/28 17:40:32 (fd:13) (pid:22433) condor_read(): fd=12
4/28 17:40:32 (fd:13) (pid:22433) condor_read(): select returned 1
4/28 17:40:32 (fd:13) (pid:22433) In case PERMISSION_AND_AD
4/28 17:40:32 (fd:13) (pid:22433) Failed to parse ClassAd expression: 'Machine'
4/28 17:40:32 (fd:13) (pid:22433) Can't get my match ad from mgr
4/28 17:40:32 (fd:13) (pid:22433) Return from HandleReq <doNegotiate> (handler: 0.016s, sec: 0.019s)
4/28 17:40:32 (fd:13) (pid:22433) CLOSE <172.16.0.3:59466> fd=12



   In Negoriator Log:
4/28 17:40:32 (fd:11) (pid:22432)       Matched 36.0 DedicatedScheduler@xxxxxxxxxxxxxxxxx <172.16.0.3:59466> preempting none <172.16.0.3:48221> slot1@xxxxxxxxxxxxxxxxx
4/28 17:40:32 (fd:11) (pid:22432)       Notifying the accountant
4/28 17:40:32 (fd:11) (pid:22432) (ACCOUNTANT) Added match between customer DedicatedScheduler@xxxxxxxxxxxxxxxxx and resource slot1@xxxxxxxxxxxxxxxxx@<172.16.0.3:48221>
4/28 17:40:32 (fd:11) (pid:22432)       Successfully matched with slot1@xxxxxxxxxxxxxxxxx
4/28 17:40:32 (fd:11) (pid:22432) selector 0x7fffe31343a0 resetting
4/28 17:40:32 (fd:11) (pid:22432) selector 0x7fffe31343a0 adding fd 8 ()
4/28 17:40:32 (fd:11) (pid:22432)     Over submitter resource limit (1) ... only consider startd ranks
4/28 17:40:32 (fd:11) (pid:22432)     Sending SEND_JOB_INFO/eom
4/28 17:40:32 (fd:11) (pid:22432) selector 0x7fffe3134240 resetting
4/28 17:40:32 (fd:11) (pid:22432) condor_write(fd=6 <172.16.0.3:59466>,,size=13,timeout=30,flags=0)
4/28 17:40:32 (fd:11) (pid:22432) selector 0x7fffe3134240 adding fd 6 ()
4/28 17:40:32 (fd:11) (pid:22432) selector 0x7fffe3134240 adding fd 6 ()
4/28 17:40:32 (fd:11) (pid:22432) selector 0x7fffe3134240 adding fd 6 ()
4/28 17:40:32 (fd:11) (pid:22432) selector 0x7fffe3134240 adding fd 6 ()
4/28 17:40:32 (fd:11) (pid:22432) condor_write(): socket 6 is readable
4/28 17:40:32 (fd:11) (pid:22432) condor_write(): Socket closed when trying to write 13 bytes to <172.16.0.3:59466>, fd is 6
4/28 17:40:32 (fd:11) (pid:22432) Buf::write(): condor_write() failed
4/28 17:40:32 (fd:11) (pid:22432)     Failed to send SEND_JOB_INFO/eom

	After this ,the first Startd Machine state changed to "Matched".



	If I changed the Authentication settings to host allowed, the problem disappeared.

	Help.		

	Thanks.
      	 Zhaokun
			   Beijing Hotsim Technology Co.,Ltd
			   zhaokun@xxxxxxxxxxxxx
          2009-04-28
=======From 2009-04-23 02:12:29 =======

>On 04/21/2009 03:23 PM, Steve Huston wrote:
>> On 04/21/2009 10:14 AM, Steve Huston wrote:
>>>    Can't get my match ad from mgr
>>> One of these corresponds to each time I see this in the negotiator's log:
>>>    condor_write(): Socket closed when trying to write 13 bytes to
>>> <ip:47659>, fd is 11
>> 4/21 15:16:30 Sent request
>> 4/21 15:16:30 In case PERMISSION_AND_AD
>> 4/21 15:16:30 Failed to parse ClassAd expression: 'Machine'
>> 4/21 15:16:30 Can't get my match ad from mgr
>> Some times it's "Failed to read ClassAd size" instead, leading me to
>> believe it's something in AttrList::initFromStream(Stream& s) defined in
>> src/classad.old/attrlist.cpp
>
>Bueller? Bueller?
>
>Due to how things are setup here, I can't just recompile with debugging
>symbols and drop the binary in place, since that would replace all the
>schedd's on the network, so unless someone with more intimate knowledge
>of the code steps in (or someone else who can compile from source and
>replicate the problem) I'm at a bit of a standstill - with a group of
>people who keep asking me when the dedicated scheduling will return
>since their jobs are now sitting idle.
>
>-- 
>Steve Huston - W2SRH - Unix Sysadmin, Dept. of Astrophysical Sciences
>  Princeton University  |    ICBM Address: 40.346525   -74.651285
>    206 Peyton Hall     |"On my ship, the Rocinante, wheeling through
>  Princeton, NJ   08544 | the galaxies; headed for the heart of Cygnus,
>    (609) 258-7375      | headlong into mystery."  -Rush, 'Cygnus X-1'
>_______________________________________________
>Condor-users mailing list
>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>subject: Unsubscribe
>You can also unsubscribe by visiting
>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>The archives can be found at: 
>https://lists.cs.wisc.edu/archive/condor-users/
>

= = = = = = = = = = = = = = = = = = = =