[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Questions about and problems with condor_kbdd.exe on WIndows 7.



The condor_kbdd that is running as local system is because you have KBDD in your DAEMON_LIST.  
You can remove that, it serves no purpose on Windows when HTCondor is running as a service because the condor_kbdd started as a service cannot see the user's keystrokes.

When you use the MSI installer and ask for a desktop policy, then the MSI will add the condor_kbdd to the registry so that it run automatically as the user whenever the user logs in.  This instance of the condor_kbdd runs as the user and CAN see the users's keystrokes.    It doesn't matter at all if the user is a member of the Administrators group.

Having two instances of the condor_kbdd writing to the same log will make for a confusing log, but it won't cause any other problems.  If your kbdd isn't working, the problem is likely to be somewhere else.  Either 

1) the kbdd that is running as the user is unable to report to the condor_startd
2) the condor configuration isn't set to pay attention to the keyboard state. 

The second one is easiest to check.

Condor_config_val -v START SUSPEND PREEMPT CONTINUE KILL

The START policy should refer in some way to the KeyboardIdle attribute, which is what the condor_kbdd actually sets.   This is usually done indirectly by referring to $(KeyboardBusy) which expands to
     (KeyboardIdle < 60)

The SUSPEND policy to also refer to KeyboardIdle.  This, in combination with PREEMPT is what actually kicks off running jobs. 

If the policy looks good. The next thing to check is that the condor_kbdd is actually talking to the condor_startd.   The quick check is to do.

Condor_status -direct <machine> -af name keyboardidle

If the condor_kbdd is reporting correctly, KeyboardIdle should be a small number when the user is sitting at the keyboard.  

It can take a few seconds (5 or so) for keystokes to be reflected in the KeyboardIdle, a working condor_kbdd will keep the KeyboardIldle value < 15 so long as the keyboard or mouse is in use.
Keep in mind that you have to use condor_status -direct to see current values.  The ads in the Collector are often a few minutes out of date, but it's the ad in the startd that matters here.

If you aren't seeing changes here, try setting 

   STARTD_DEBUG = $(STARTD_DEBUG) D_IDLE D_KEYBOARD 

Then restarting the condor_startd and looking at the StartLog.

-tj




-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Jens Schmaler
Sent: Thursday, March 10, 2016 3:43 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Questions about and problems with condor_kbdd.exe on WIndows 7.

Hi all,

I am experiencing the same issue with HTCondor 8.4.3 on Windows 7 machines. As in your case, the user account has administrator rights on the machine, and this is also something I cannot change. The effect seems to be a non-functional kbd, so that jobs are executed on the machine in spite of the user being present.

Did you manage to solve this issue?
@Condor experts: Is this a known effect with this version of Condor? I never saw it before upgrading from 8.2.6 to 8.4.3, and this is currently a blocker for our cluster with users getting annoyed by undesired jobs and shutting down Condor altogether :(

Thanks for any hint!

Best regards,
Jens


Am 28.01.16 um 07:00 schrieb Stub:
> On Wed, 27 Jan 2016 19:28:54 -0600, Ziliang Guo wrote:
> 
>> As far as your question goes, technically the kbdd running under your 
>> logged in user account should not be able to access the kbdd log, 
>> that log is owned by the system. Code was explicitly added to the 
>> logging code so that it won't outright exit when it can't write to a 
>> log file for this very reason. I can only think of two reasons for 
>> why the kbdd running under the user account to be able to write to 
>> that log file, the first being that it was somehow able to create 
>> that file before the kbdd that runs under the system account got 
>> started, so the owner of that file is your user account and not the 
>> system account (though this begs the question of how the user owned 
>> kbdd was able to write into a directory that's supposed to be owned 
>> by the system to begin with). The other possibility is that you're 
>> running an admin account without UAC that auto-elevates the 
>> privileges of all processes, which would allow the kbdd process to 
>> access system-owned files and directories. So, I supp!
> ose, yes, something does seem to be wrong, and that's, how is the kbdd 
> running under the user account accessing a log file that is supposed 
> to be owned by the system account?
> 
> 
> 
> Thank you for your elaborate explanation.
> 
> You are right; the "logged-in user" is indeed also the Win7 Admin. So 
> that explains why its condor_kbdd.exe can access the KbdLog file. In 
> this case there are two condor_kbdd.exe running (one as a regular user 
> with Admin rights, and one as SYSTEM) and both are sending data to the 
> same Log file, which cause havoc, can it not?
> 
> 
> What I then still do not understand is what causes the first instance 
> of the condor_kbdd.exe. The Condor MSI file on Win 7 installs the 
> condor service as "automatic", which fires up the condor_master.exe 
> upon boot; by configuration, the condor_master.exe will then start the 
> condor_procd.exe, condor_startd.exe, and condor_kbdd.exe. So if a 
> regular/non-admin user starts the condor_kbdd.exe, system security 
> prevents it to write to the Log file, upon which the condor_kbdd.exe 
> exits. But why is a condor_kbdd.exe already fired up by the regular 
> user BEFORE the condor service starts, if that instance of 
> condor_kbdd.exe is supposed to exit anyway?!?! The Win7 startup folder 
> is empty; so that does not do it!
> 
> 
> Can you help me understand this part of the story?
> 
> Thanks! R. ----------- On Sunday, January 24, 2016 3:10 PM, Stub
> wrote:
> 
> Hi,
> 
> I'm running HTCondor 8.4.3 on a Windows 7 PC, which serves as an 
> execute machine in the HTCondor pool. The DAEMON_LIST is "MASTER 
> STARTD KBDD"
> 
> Very quickly after the PC boots up the Task Manager shows a 
> condor_kbdd.exe running as the logged-in user. The service "Condor"
> is then not yet running.
> 
> After some time the Condor service fires up and a list four new 
> daemons appear in the Task Manager running as "SYSTEM":
> condor_kbdd.exe condor_master.exe condor_procd.exe condor_startd.exe
> 
> Note that there are now TWO condor_kbdd.exe daemons running: one as 
> the logged-in user, and one as SYSTEM.
> 
> It seems that both condor_kbdd.exe files are using the same KbdLog 
> file, as I only can delete this log file AFTER killing BOTH 
> condor_kbdd.exe processes!
> 
> When I stop the condor service with "SC STOP CONDOR", only the SYSTEM 
> daemons are stopped; the condor_kbdd.exe by the logged-in user keeps 
> on running.
> 
> The contents of the corresponding condor_kbdd Log file is at the end 
> of this email. Notice the two header entries of the two 
> condor_kbdd.exe daemons. The first entry complains about not finding 
> condor_startd.exe, obviously, because startd is not yet running:
> 01/24/16 14:54:03 Can't find address for startd Virtual-KU 01/24/16
> 14:54:03 Can't locate startd, aborting (Can't find address for startd
> Virtual-KU) 01/24/16 14:54:09 ERROR: SECMAN:2003:TCP connection to 
> failed.
> 
> The second entry is piling up with an endless flow of error messages, 
> one every 5 seconds: 01/24/16 14:55:57 GetCursorInfo() failed
> (err=5)
> 
> 
> 
> Is all this an expected situation and condition? I cannot follow or 
> understand the sequence of events, but is something going very wrong 
> here?
> 
> 
> Thank you , R.
> 
> The contents of the KbdLog file:
> 
> 01/24/16 14:53:58
> ****************************************************** 01/24/16
> 14:53:58 ** condor_kbdd (CONDOR_KBDD) STARTING UP 01/24/16 14:53:58
> ** C:\condor\bin\condor_kbdd.exe 01/24/16 14:53:58 ** SubsystemInfo:
> name=KBDD type=DAEMON(12) class=DAEMON(1) 01/24/16 14:53:58 **
> Configuration: subsystem:KBDD local:<NONE> class:DAEMON 01/24/16
> 14:53:58 ** $CondorVersion: 8.4.3 Dec 15 2015 BuildID: 352143 $
> 01/24/16 14:53:58 ** $CondorPlatform: x86_64_Windows7 $ 01/24/16
> 14:53:58 ** PID = 2464 01/24/16 14:53:58 ** Log last touched time 
> unavailable (No such file or directory) 01/24/16 14:53:58
> ****************************************************** 01/24/16
> 14:53:58 Using config source: C:\condor\condor_config 01/24/16
> 14:53:58 Using local config sources: 01/24/16 14:53:58 condor_urlfetch 
> -KBDD http://condor.dummy.edu/pool/condor_config_win7_cloud.local
> C:\condor\condor_config.url_cache | 01/24/16 14:53:58 config Macros = 
> 56, Sorted = 56, StringBytes = 1821, TablesBytes = 1592 01/24/16
> 14:53:58 CLASSAD_CACHING is ENABLED 01/24/16 14:53:58 Daemon Log is
> logging: D_ALWAYS D_ERROR 01/24/16 14:53:58 Daemoncore: Listening at 
> <0.0.0.0:49162> on TCP (ReliSock) and UDP (SafeSock). 01/24/16
> 14:53:58 DaemonCore: command socket at 
> <10.0.2.15:49162?addrs=10.0.2.15-49162> 01/24/16 14:53:58 DaemonCore:
> private command socket at <10.0.2.15:49162?addrs=10.0.2.15-49162>
> 01/24/16 14:54:03 Can't find address for startd Virtual-KU 01/24/16
> 14:54:03 Can't locate startd, aborting (Can't find address for startd
> Virtual-KU) 01/24/16 14:54:09 ERROR: SECMAN:2003:TCP connection to 
> failed. 01/24/16 14:54:09 Can't send X_EVENT_NOTIFICATION command to 
> startd at: (null), aborting 01/24/16 14:54:14 Can't find address for 
> startd Virtual-KU 01/24/16 14:54:14 Can't locate startd, aborting 
> (Can't find address for startd Virtual-KU) 01/24/16 14:54:19 ERROR:
> SECMAN:2003:TCP connection to  failed. 01/24/16 14:54:19 Can't send 
> X_EVENT_NOTIFICATION command to startd at: (null), aborting 01/24/16
> 14:54:24 Can't find address for startd Virtual-KU 01/24/16 14:54:24 
> Can't locate startd, aborting (Can't find address for startd
> Virtual-KU) 01/24/16 14:54:29 ERROR: SECMAN:2003:TCP connection to 
> failed. 01/24/16 14:54:29 Can't send X_EVENT_NOTIFICATION command to 
> startd at: (null), aborting 01/24/16 14:54:34 Can't find address for 
> startd Virtual-KU 01/24/16 14:54:34 Can't locate startd, aborting 
> (Can't find address for startd Virtual-KU) 01/24/16 14:54:39 ERROR:
> SECMAN:2003:TCP connection to  failed. 01/24/16 14:54:39 Can't send 
> X_EVENT_NOTIFICATION command to startd at: (null), aborting 01/24/16
> 14:54:44 Can't find address for startd Virtual-KU 01/24/16 14:54:44 
> Can't locate startd, aborting (Can't find address for startd
> Virtual-KU) 01/24/16 14:54:49 ERROR: SECMAN:2003:TCP connection to 
> failed. 01/24/16 14:54:49 Can't send X_EVENT_NOTIFICATION command to 
> startd at: (null), aborting 01/24/16 14:54:54 Can't find address for 
> startd Virtual-KU 01/24/16 14:54:54 Can't locate startd, aborting 
> (Can't find address for startd Virtual-KU) 01/24/16 14:54:59 ERROR:
> SECMAN:2003:TCP connection to  failed. 01/24/16 14:54:59 Can't send 
> X_EVENT_NOTIFICATION command to startd at: (null), aborting 01/24/16
> 14:55:52 ******************************************************
> 01/24/16 14:55:52 ** condor_kbdd (CONDOR_KBDD) STARTING UP 01/24/16
> 14:55:52 ** C:\condor\bin\condor_kbdd.exe 01/24/16 14:55:52 **
> SubsystemInfo: name=KBDD type=DAEMON(12) class=DAEMON(1) 01/24/16
> 14:55:52 ** Configuration: subsystem:KBDD local:<NONE> class:DAEMON
> 01/24/16 14:55:52 ** $CondorVersion: 8.4.3 Dec 15 2015 BuildID:
> 352143 $ 01/24/16 14:55:52 ** $CondorPlatform: x86_64_Windows7 $
> 01/24/16 14:55:52 ** PID = 2368 01/24/16 14:55:52 ** Log last touched
> 1/24 14:55:00 01/24/16 14:55:52
> ****************************************************** 01/24/16
> 14:55:52 Using config source: C:\condor\condor_config 01/24/16
> 14:55:52 Using local config sources: 01/24/16 14:55:52 condor_urlfetch 
> -KBDD http://condor.dummy.edu/pool/condor_config_win7_cloud.local
> C:\condor\condor_config.url_cache | 01/24/16 14:55:52 config Macros = 
> 56, Sorted = 56, StringBytes = 1818, TablesBytes = 1592 01/24/16
> 14:55:52 CLASSAD_CACHING is ENABLED 01/24/16 14:55:52 Daemon Log is
> logging: D_ALWAYS D_ERROR 01/24/16 14:55:52 Daemoncore: Listening at 
> <0.0.0.0:49179> on TCP (ReliSock) and UDP (SafeSock). 01/24/16
> 14:55:52 DaemonCore: command socket at 
> <10.0.2.15:49179?addrs=10.0.2.15-49179> 01/24/16 14:55:52 DaemonCore:
> private command socket at <10.0.2.15:49179?addrs=10.0.2.15-49179>
> 01/24/16 14:55:57 GetCursorInfo() failed (err=5) 01/24/16 14:56:02
> GetCursorInfo() failed (err=5) 01/24/16 14:56:07 GetCursorInfo() 
> failed (err=5) 01/24/16 14:56:12 GetCursorInfo() failed (err=5)
> 01/24/16 14:56:17 GetCursorInfo() failed (err=5) 01/24/16 14:56:23
> GetCursorInfo() failed (err=5) 01/24/16 14:56:28 GetCursorInfo() 
> failed (err=5) 01/24/16 14:56:33 GetCursorInfo() failed (err=5)
> 01/24/16 14:56:38 GetCursorInfo() failed (err=5) 01/24/16 14:56:43
> GetCursorInfo() failed (err=5) 01/24/16 14:56:48 GetCursorInfo() 
> failed (err=5) 01/24/16 14:56:53 GetCursorInfo() failed (err=5)
> 01/24/16 14:56:58 GetCursorInfo() failed (err=5) 01/24/16 14:57:03
> GetCursorInfo() failed (err=5)
> 
> _______________________________________________ HTCondor-users mailing 
> list To unsubscribe, send a message to 
> htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can 
> also unsubscribe by visiting 
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/