[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Questions about and problems with condor_kbdd.exe on WIndows 7.



Hi!

Thanks for the hint with condor_birdwatcher. However, before adding yet
another tool to the game, I would first like to understand whether kbdd
is really broken in 8.4.3 (at least on Win7 and when the user has
administrator permissions) or whether there is some other effect that I
am missing. In fact, I did not see many recent changes to kbdd when
looking through the release notes. Can anybody please give a bit of
clarification here?

Thanks and best regards,
Jens


Am 10.03.16 um 23:04 schrieb Ziliang Guo:
> Didn't even see that response to my previous email. R, you might want
> to really consider a different email address, gmail at least seems
> extremely aggressive about treating it as if it were spam and I'd
> wager at least half of your unanswered questions are because people
> don't see your email in the first place.
> 
> The kbdd should not be exiting if it cannot access the log file, code
> was explicitly added to deal with that situation after the kbdd was
> moved out of the system session. An alternative, and TJ will have to
> confirm whether it made it in or not, is to make use of birdwatcher, a
> utility that's bundled as part of the Windows install and was
> originally meant to notify users what condor was doing. One of the
> last things I did before I left the project was to put in code into
> birdwatcher that would allow it to send updates to the startd about
> keyboard activity. Birdwatcher does not rely on logs or anything, so
> there should be none of this permissions nonsense. Though again, TJ
> would need to confirm that the code is in and the Windows config has
> been modified to take advantage of it.
> 
> On Thu, Mar 10, 2016 at 3:43 PM, Jens Schmaler <jens.schmaler@xxxxxx> wrote:
>> Hi all,
>>
>> I am experiencing the same issue with HTCondor 8.4.3 on Windows 7
>> machines. As in your case, the user account has administrator rights on
>> the machine, and this is also something I cannot change. The effect
>> seems to be a non-functional kbd, so that jobs are executed on the
>> machine in spite of the user being present.
>>
>> Did you manage to solve this issue?
>> @Condor experts: Is this a known effect with this version of Condor? I
>> never saw it before upgrading from 8.2.6 to 8.4.3, and this is currently
>> a blocker for our cluster with users getting annoyed by undesired jobs
>> and shutting down Condor altogether :(
>>
>> Thanks for any hint!
>>
>> Best regards,
>> Jens
>>
>>
>> Am 28.01.16 um 07:00 schrieb Stub:
>>> On Wed, 27 Jan 2016 19:28:54 -0600, Ziliang Guo wrote:
>>>
>>>> As far as your question goes, technically the kbdd running under
>>>> your logged in user account should not be able to access the kbdd
>>>> log, that log is owned by the system. Code was explicitly added to
>>>> the logging code so that it won't outright exit when it can't write
>>>> to a log file for this very reason. I can only think of two reasons
>>>> for why the kbdd running under the user account to be able to write
>>>> to that log file, the first being that it was somehow able to
>>>> create that file before the kbdd that runs under the system account
>>>> got started, so the owner of that file is your user account and not
>>>> the system account (though this begs the question of how the user
>>>> owned kbdd was able to write into a directory that's supposed to be
>>>> owned by the system to begin with). The other possibility is that
>>>> you're running an admin account without UAC that auto-elevates the
>>>> privileges of all processes, which would allow the kbdd process to
>>>> access system-owned files and directories. So, I supp!
>>> ose, yes, something does seem to be wrong, and that's, how is the
>>> kbdd running under the user account accessing a log file that is
>>> supposed to be owned by the system account?
>>>
>>>
>>>
>>> Thank you for your elaborate explanation.
>>>
>>> You are right; the "logged-in user" is indeed also the Win7 Admin. So
>>> that explains why its condor_kbdd.exe can access the KbdLog file. In
>>> this case there are two condor_kbdd.exe running (one as a regular
>>> user with Admin rights, and one as SYSTEM) and both are sending data
>>> to the same Log file, which cause havoc, can it not?
>>>
>>>
>>> What I then still do not understand is what causes the first instance
>>> of the condor_kbdd.exe. The Condor MSI file on Win 7 installs the
>>> condor service as "automatic", which fires up the condor_master.exe
>>> upon boot; by configuration, the condor_master.exe will then start
>>> the condor_procd.exe, condor_startd.exe, and condor_kbdd.exe. So if a
>>> regular/non-admin user starts the condor_kbdd.exe, system security
>>> prevents it to write to the Log file, upon which the condor_kbdd.exe
>>> exits. But why is a condor_kbdd.exe already fired up by the regular
>>> user BEFORE the condor service starts, if that instance of
>>> condor_kbdd.exe is supposed to exit anyway?!?! The Win7 startup
>>> folder is empty; so that does not do it!
>>>
>>>
>>> Can you help me understand this part of the story?
>>>
>>> Thanks! R. ----------- On Sunday, January 24, 2016 3:10 PM, Stub
>>> wrote:
>>>
>>> Hi,
>>>
>>> I'm running HTCondor 8.4.3 on a Windows 7 PC, which serves as an
>>> execute machine in the HTCondor pool. The DAEMON_LIST is "MASTER
>>> STARTD KBDD"
>>>
>>> Very quickly after the PC boots up the Task Manager shows a
>>> condor_kbdd.exe running as the logged-in user. The service "Condor"
>>> is then not yet running.
>>>
>>> After some time the Condor service fires up and a list four new
>>> daemons appear in the Task Manager running as "SYSTEM":
>>> condor_kbdd.exe condor_master.exe condor_procd.exe condor_startd.exe
>>>
>>> Note that there are now TWO condor_kbdd.exe daemons running: one as
>>> the logged-in user, and one as SYSTEM.
>>>
>>> It seems that both condor_kbdd.exe files are using the same KbdLog
>>> file, as I only can delete this log file AFTER killing BOTH
>>> condor_kbdd.exe processes!
>>>
>>> When I stop the condor service with "SC STOP CONDOR", only the SYSTEM
>>> daemons are stopped; the condor_kbdd.exe by the logged-in user keeps
>>> on running.
>>>
>>> The contents of the corresponding condor_kbdd Log file is at the end
>>> of this email. Notice the two header entries of the two
>>> condor_kbdd.exe daemons. The first entry complains about not finding
>>> condor_startd.exe, obviously, because startd is not yet running:
>>> 01/24/16 14:54:03 Can't find address for startd Virtual-KU 01/24/16
>>> 14:54:03 Can't locate startd, aborting (Can't find address for startd
>>> Virtual-KU) 01/24/16 14:54:09 ERROR: SECMAN:2003:TCP connection to
>>> failed.
>>>
>>> The second entry is piling up with an endless flow of error messages,
>>> one every 5 seconds: 01/24/16 14:55:57 GetCursorInfo() failed
>>> (err=5)
>>>
>>>
>>>
>>> Is all this an expected situation and condition? I cannot follow or
>>> understand the sequence of events, but is something going very wrong
>>> here?
>>>
>>>
>>> Thank you , R.
>>>
>>> The contents of the KbdLog file:
>>>
>>> 01/24/16 14:53:58
>>> ****************************************************** 01/24/16
>>> 14:53:58 ** condor_kbdd (CONDOR_KBDD) STARTING UP 01/24/16 14:53:58
>>> ** C:\condor\bin\condor_kbdd.exe 01/24/16 14:53:58 ** SubsystemInfo:
>>> name=KBDD type=DAEMON(12) class=DAEMON(1) 01/24/16 14:53:58 **
>>> Configuration: subsystem:KBDD local:<NONE> class:DAEMON 01/24/16
>>> 14:53:58 ** $CondorVersion: 8.4.3 Dec 15 2015 BuildID: 352143 $
>>> 01/24/16 14:53:58 ** $CondorPlatform: x86_64_Windows7 $ 01/24/16
>>> 14:53:58 ** PID = 2464 01/24/16 14:53:58 ** Log last touched time
>>> unavailable (No such file or directory) 01/24/16 14:53:58
>>> ****************************************************** 01/24/16
>>> 14:53:58 Using config source: C:\condor\condor_config 01/24/16
>>> 14:53:58 Using local config sources: 01/24/16 14:53:58
>>> condor_urlfetch -KBDD
>>> http://condor.dummy.edu/pool/condor_config_win7_cloud.local
>>> C:\condor\condor_config.url_cache | 01/24/16 14:53:58 config Macros =
>>> 56, Sorted = 56, StringBytes = 1821, TablesBytes = 1592 01/24/16
>>> 14:53:58 CLASSAD_CACHING is ENABLED 01/24/16 14:53:58 Daemon Log is
>>> logging: D_ALWAYS D_ERROR 01/24/16 14:53:58 Daemoncore: Listening at
>>> <0.0.0.0:49162> on TCP (ReliSock) and UDP (SafeSock). 01/24/16
>>> 14:53:58 DaemonCore: command socket at
>>> <10.0.2.15:49162?addrs=10.0.2.15-49162> 01/24/16 14:53:58 DaemonCore:
>>> private command socket at <10.0.2.15:49162?addrs=10.0.2.15-49162>
>>> 01/24/16 14:54:03 Can't find address for startd Virtual-KU 01/24/16
>>> 14:54:03 Can't locate startd, aborting (Can't find address for startd
>>> Virtual-KU) 01/24/16 14:54:09 ERROR: SECMAN:2003:TCP connection to
>>> failed. 01/24/16 14:54:09 Can't send X_EVENT_NOTIFICATION command to
>>> startd at: (null), aborting 01/24/16 14:54:14 Can't find address for
>>> startd Virtual-KU 01/24/16 14:54:14 Can't locate startd, aborting
>>> (Can't find address for startd Virtual-KU) 01/24/16 14:54:19 ERROR:
>>> SECMAN:2003:TCP connection to  failed. 01/24/16 14:54:19 Can't send
>>> X_EVENT_NOTIFICATION command to startd at: (null), aborting 01/24/16
>>> 14:54:24 Can't find address for startd Virtual-KU 01/24/16 14:54:24
>>> Can't locate startd, aborting (Can't find address for startd
>>> Virtual-KU) 01/24/16 14:54:29 ERROR: SECMAN:2003:TCP connection to
>>> failed. 01/24/16 14:54:29 Can't send X_EVENT_NOTIFICATION command to
>>> startd at: (null), aborting 01/24/16 14:54:34 Can't find address for
>>> startd Virtual-KU 01/24/16 14:54:34 Can't locate startd, aborting
>>> (Can't find address for startd Virtual-KU) 01/24/16 14:54:39 ERROR:
>>> SECMAN:2003:TCP connection to  failed. 01/24/16 14:54:39 Can't send
>>> X_EVENT_NOTIFICATION command to startd at: (null), aborting 01/24/16
>>> 14:54:44 Can't find address for startd Virtual-KU 01/24/16 14:54:44
>>> Can't locate startd, aborting (Can't find address for startd
>>> Virtual-KU) 01/24/16 14:54:49 ERROR: SECMAN:2003:TCP connection to
>>> failed. 01/24/16 14:54:49 Can't send X_EVENT_NOTIFICATION command to
>>> startd at: (null), aborting 01/24/16 14:54:54 Can't find address for
>>> startd Virtual-KU 01/24/16 14:54:54 Can't locate startd, aborting
>>> (Can't find address for startd Virtual-KU) 01/24/16 14:54:59 ERROR:
>>> SECMAN:2003:TCP connection to  failed. 01/24/16 14:54:59 Can't send
>>> X_EVENT_NOTIFICATION command to startd at: (null), aborting 01/24/16
>>> 14:55:52 ******************************************************
>>> 01/24/16 14:55:52 ** condor_kbdd (CONDOR_KBDD) STARTING UP 01/24/16
>>> 14:55:52 ** C:\condor\bin\condor_kbdd.exe 01/24/16 14:55:52 **
>>> SubsystemInfo: name=KBDD type=DAEMON(12) class=DAEMON(1) 01/24/16
>>> 14:55:52 ** Configuration: subsystem:KBDD local:<NONE> class:DAEMON
>>> 01/24/16 14:55:52 ** $CondorVersion: 8.4.3 Dec 15 2015 BuildID:
>>> 352143 $ 01/24/16 14:55:52 ** $CondorPlatform: x86_64_Windows7 $
>>> 01/24/16 14:55:52 ** PID = 2368 01/24/16 14:55:52 ** Log last touched
>>> 1/24 14:55:00 01/24/16 14:55:52
>>> ****************************************************** 01/24/16
>>> 14:55:52 Using config source: C:\condor\condor_config 01/24/16
>>> 14:55:52 Using local config sources: 01/24/16 14:55:52
>>> condor_urlfetch -KBDD
>>> http://condor.dummy.edu/pool/condor_config_win7_cloud.local
>>> C:\condor\condor_config.url_cache | 01/24/16 14:55:52 config Macros =
>>> 56, Sorted = 56, StringBytes = 1818, TablesBytes = 1592 01/24/16
>>> 14:55:52 CLASSAD_CACHING is ENABLED 01/24/16 14:55:52 Daemon Log is
>>> logging: D_ALWAYS D_ERROR 01/24/16 14:55:52 Daemoncore: Listening at
>>> <0.0.0.0:49179> on TCP (ReliSock) and UDP (SafeSock). 01/24/16
>>> 14:55:52 DaemonCore: command socket at
>>> <10.0.2.15:49179?addrs=10.0.2.15-49179> 01/24/16 14:55:52 DaemonCore:
>>> private command socket at <10.0.2.15:49179?addrs=10.0.2.15-49179>
>>> 01/24/16 14:55:57 GetCursorInfo() failed (err=5) 01/24/16 14:56:02
>>> GetCursorInfo() failed (err=5) 01/24/16 14:56:07 GetCursorInfo()
>>> failed (err=5) 01/24/16 14:56:12 GetCursorInfo() failed (err=5)
>>> 01/24/16 14:56:17 GetCursorInfo() failed (err=5) 01/24/16 14:56:23
>>> GetCursorInfo() failed (err=5) 01/24/16 14:56:28 GetCursorInfo()
>>> failed (err=5) 01/24/16 14:56:33 GetCursorInfo() failed (err=5)
>>> 01/24/16 14:56:38 GetCursorInfo() failed (err=5) 01/24/16 14:56:43
>>> GetCursorInfo() failed (err=5) 01/24/16 14:56:48 GetCursorInfo()
>>> failed (err=5) 01/24/16 14:56:53 GetCursorInfo() failed (err=5)
>>> 01/24/16 14:56:58 GetCursorInfo() failed (err=5) 01/24/16 14:57:03
>>> GetCursorInfo() failed (err=5)
>>>
>>> _______________________________________________ HTCondor-users
>>> mailing list To unsubscribe, send a message to
>>> htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You
>>> can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>