[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Questions about and problems with condor_kbdd.exe on WIndows 7.



Hi all,

I am experiencing the same issue with HTCondor 8.4.3 on Windows 7
machines. As in your case, the user account has administrator rights on
the machine, and this is also something I cannot change. The effect
seems to be a non-functional kbd, so that jobs are executed on the
machine in spite of the user being present.

Did you manage to solve this issue?
@Condor experts: Is this a known effect with this version of Condor? I
never saw it before upgrading from 8.2.6 to 8.4.3, and this is currently
a blocker for our cluster with users getting annoyed by undesired jobs
and shutting down Condor altogether :(

Thanks for any hint!

Best regards,
Jens


Am 28.01.16 um 07:00 schrieb Stub:
> On Wed, 27 Jan 2016 19:28:54 -0600, Ziliang Guo wrote:
> 
>> As far as your question goes, technically the kbdd running under
>> your logged in user account should not be able to access the kbdd
>> log, that log is owned by the system. Code was explicitly added to
>> the logging code so that it won't outright exit when it can't write
>> to a log file for this very reason. I can only think of two reasons
>> for why the kbdd running under the user account to be able to write
>> to that log file, the first being that it was somehow able to
>> create that file before the kbdd that runs under the system account
>> got started, so the owner of that file is your user account and not
>> the system account (though this begs the question of how the user
>> owned kbdd was able to write into a directory that's supposed to be
>> owned by the system to begin with). The other possibility is that
>> you're running an admin account without UAC that auto-elevates the
>> privileges of all processes, which would allow the kbdd process to
>> access system-owned files and directories. So, I supp!
> ose, yes, something does seem to be wrong, and that's, how is the
> kbdd running under the user account accessing a log file that is
> supposed to be owned by the system account?
> 
> 
> 
> Thank you for your elaborate explanation.
> 
> You are right; the "logged-in user" is indeed also the Win7 Admin. So
> that explains why its condor_kbdd.exe can access the KbdLog file. In
> this case there are two condor_kbdd.exe running (one as a regular
> user with Admin rights, and one as SYSTEM) and both are sending data
> to the same Log file, which cause havoc, can it not?
> 
> 
> What I then still do not understand is what causes the first instance
> of the condor_kbdd.exe. The Condor MSI file on Win 7 installs the
> condor service as "automatic", which fires up the condor_master.exe
> upon boot; by configuration, the condor_master.exe will then start
> the condor_procd.exe, condor_startd.exe, and condor_kbdd.exe. So if a
> regular/non-admin user starts the condor_kbdd.exe, system security
> prevents it to write to the Log file, upon which the condor_kbdd.exe
> exits. But why is a condor_kbdd.exe already fired up by the regular
> user BEFORE the condor service starts, if that instance of
> condor_kbdd.exe is supposed to exit anyway?!?! The Win7 startup
> folder is empty; so that does not do it!
> 
> 
> Can you help me understand this part of the story?
> 
> Thanks! R. ----------- On Sunday, January 24, 2016 3:10 PM, Stub
> wrote:
> 
> Hi,
> 
> I'm running HTCondor 8.4.3 on a Windows 7 PC, which serves as an
> execute machine in the HTCondor pool. The DAEMON_LIST is "MASTER
> STARTD KBDD"
> 
> Very quickly after the PC boots up the Task Manager shows a
> condor_kbdd.exe running as the logged-in user. The service "Condor"
> is then not yet running.
> 
> After some time the Condor service fires up and a list four new
> daemons appear in the Task Manager running as "SYSTEM": 
> condor_kbdd.exe condor_master.exe condor_procd.exe condor_startd.exe
> 
> Note that there are now TWO condor_kbdd.exe daemons running: one as
> the logged-in user, and one as SYSTEM.
> 
> It seems that both condor_kbdd.exe files are using the same KbdLog
> file, as I only can delete this log file AFTER killing BOTH
> condor_kbdd.exe processes!
> 
> When I stop the condor service with "SC STOP CONDOR", only the SYSTEM
> daemons are stopped; the condor_kbdd.exe by the logged-in user keeps
> on running.
> 
> The contents of the corresponding condor_kbdd Log file is at the end
> of this email. Notice the two header entries of the two
> condor_kbdd.exe daemons. The first entry complains about not finding
> condor_startd.exe, obviously, because startd is not yet running: 
> 01/24/16 14:54:03 Can't find address for startd Virtual-KU 01/24/16
> 14:54:03 Can't locate startd, aborting (Can't find address for startd
> Virtual-KU) 01/24/16 14:54:09 ERROR: SECMAN:2003:TCP connection to
> failed.
> 
> The second entry is piling up with an endless flow of error messages,
> one every 5 seconds: 01/24/16 14:55:57 GetCursorInfo() failed
> (err=5)
> 
> 
> 
> Is all this an expected situation and condition? I cannot follow or
> understand the sequence of events, but is something going very wrong
> here?
> 
> 
> Thank you , R.
> 
> The contents of the KbdLog file:
> 
> 01/24/16 14:53:58
> ****************************************************** 01/24/16
> 14:53:58 ** condor_kbdd (CONDOR_KBDD) STARTING UP 01/24/16 14:53:58
> ** C:\condor\bin\condor_kbdd.exe 01/24/16 14:53:58 ** SubsystemInfo:
> name=KBDD type=DAEMON(12) class=DAEMON(1) 01/24/16 14:53:58 **
> Configuration: subsystem:KBDD local:<NONE> class:DAEMON 01/24/16
> 14:53:58 ** $CondorVersion: 8.4.3 Dec 15 2015 BuildID: 352143 $ 
> 01/24/16 14:53:58 ** $CondorPlatform: x86_64_Windows7 $ 01/24/16
> 14:53:58 ** PID = 2464 01/24/16 14:53:58 ** Log last touched time
> unavailable (No such file or directory) 01/24/16 14:53:58
> ****************************************************** 01/24/16
> 14:53:58 Using config source: C:\condor\condor_config 01/24/16
> 14:53:58 Using local config sources: 01/24/16 14:53:58
> condor_urlfetch -KBDD
> http://condor.dummy.edu/pool/condor_config_win7_cloud.local
> C:\condor\condor_config.url_cache | 01/24/16 14:53:58 config Macros =
> 56, Sorted = 56, StringBytes = 1821, TablesBytes = 1592 01/24/16
> 14:53:58 CLASSAD_CACHING is ENABLED 01/24/16 14:53:58 Daemon Log is
> logging: D_ALWAYS D_ERROR 01/24/16 14:53:58 Daemoncore: Listening at
> <0.0.0.0:49162> on TCP (ReliSock) and UDP (SafeSock). 01/24/16
> 14:53:58 DaemonCore: command socket at
> <10.0.2.15:49162?addrs=10.0.2.15-49162> 01/24/16 14:53:58 DaemonCore:
> private command socket at <10.0.2.15:49162?addrs=10.0.2.15-49162> 
> 01/24/16 14:54:03 Can't find address for startd Virtual-KU 01/24/16
> 14:54:03 Can't locate startd, aborting (Can't find address for startd
> Virtual-KU) 01/24/16 14:54:09 ERROR: SECMAN:2003:TCP connection to
> failed. 01/24/16 14:54:09 Can't send X_EVENT_NOTIFICATION command to
> startd at: (null), aborting 01/24/16 14:54:14 Can't find address for
> startd Virtual-KU 01/24/16 14:54:14 Can't locate startd, aborting
> (Can't find address for startd Virtual-KU) 01/24/16 14:54:19 ERROR:
> SECMAN:2003:TCP connection to  failed. 01/24/16 14:54:19 Can't send
> X_EVENT_NOTIFICATION command to startd at: (null), aborting 01/24/16
> 14:54:24 Can't find address for startd Virtual-KU 01/24/16 14:54:24
> Can't locate startd, aborting (Can't find address for startd
> Virtual-KU) 01/24/16 14:54:29 ERROR: SECMAN:2003:TCP connection to
> failed. 01/24/16 14:54:29 Can't send X_EVENT_NOTIFICATION command to
> startd at: (null), aborting 01/24/16 14:54:34 Can't find address for
> startd Virtual-KU 01/24/16 14:54:34 Can't locate startd, aborting
> (Can't find address for startd Virtual-KU) 01/24/16 14:54:39 ERROR:
> SECMAN:2003:TCP connection to  failed. 01/24/16 14:54:39 Can't send
> X_EVENT_NOTIFICATION command to startd at: (null), aborting 01/24/16
> 14:54:44 Can't find address for startd Virtual-KU 01/24/16 14:54:44
> Can't locate startd, aborting (Can't find address for startd
> Virtual-KU) 01/24/16 14:54:49 ERROR: SECMAN:2003:TCP connection to
> failed. 01/24/16 14:54:49 Can't send X_EVENT_NOTIFICATION command to
> startd at: (null), aborting 01/24/16 14:54:54 Can't find address for
> startd Virtual-KU 01/24/16 14:54:54 Can't locate startd, aborting
> (Can't find address for startd Virtual-KU) 01/24/16 14:54:59 ERROR:
> SECMAN:2003:TCP connection to  failed. 01/24/16 14:54:59 Can't send
> X_EVENT_NOTIFICATION command to startd at: (null), aborting 01/24/16
> 14:55:52 ****************************************************** 
> 01/24/16 14:55:52 ** condor_kbdd (CONDOR_KBDD) STARTING UP 01/24/16
> 14:55:52 ** C:\condor\bin\condor_kbdd.exe 01/24/16 14:55:52 **
> SubsystemInfo: name=KBDD type=DAEMON(12) class=DAEMON(1) 01/24/16
> 14:55:52 ** Configuration: subsystem:KBDD local:<NONE> class:DAEMON 
> 01/24/16 14:55:52 ** $CondorVersion: 8.4.3 Dec 15 2015 BuildID:
> 352143 $ 01/24/16 14:55:52 ** $CondorPlatform: x86_64_Windows7 $ 
> 01/24/16 14:55:52 ** PID = 2368 01/24/16 14:55:52 ** Log last touched
> 1/24 14:55:00 01/24/16 14:55:52
> ****************************************************** 01/24/16
> 14:55:52 Using config source: C:\condor\condor_config 01/24/16
> 14:55:52 Using local config sources: 01/24/16 14:55:52
> condor_urlfetch -KBDD
> http://condor.dummy.edu/pool/condor_config_win7_cloud.local
> C:\condor\condor_config.url_cache | 01/24/16 14:55:52 config Macros =
> 56, Sorted = 56, StringBytes = 1818, TablesBytes = 1592 01/24/16
> 14:55:52 CLASSAD_CACHING is ENABLED 01/24/16 14:55:52 Daemon Log is
> logging: D_ALWAYS D_ERROR 01/24/16 14:55:52 Daemoncore: Listening at
> <0.0.0.0:49179> on TCP (ReliSock) and UDP (SafeSock). 01/24/16
> 14:55:52 DaemonCore: command socket at
> <10.0.2.15:49179?addrs=10.0.2.15-49179> 01/24/16 14:55:52 DaemonCore:
> private command socket at <10.0.2.15:49179?addrs=10.0.2.15-49179> 
> 01/24/16 14:55:57 GetCursorInfo() failed (err=5) 01/24/16 14:56:02
> GetCursorInfo() failed (err=5) 01/24/16 14:56:07 GetCursorInfo()
> failed (err=5) 01/24/16 14:56:12 GetCursorInfo() failed (err=5) 
> 01/24/16 14:56:17 GetCursorInfo() failed (err=5) 01/24/16 14:56:23
> GetCursorInfo() failed (err=5) 01/24/16 14:56:28 GetCursorInfo()
> failed (err=5) 01/24/16 14:56:33 GetCursorInfo() failed (err=5) 
> 01/24/16 14:56:38 GetCursorInfo() failed (err=5) 01/24/16 14:56:43
> GetCursorInfo() failed (err=5) 01/24/16 14:56:48 GetCursorInfo()
> failed (err=5) 01/24/16 14:56:53 GetCursorInfo() failed (err=5) 
> 01/24/16 14:56:58 GetCursorInfo() failed (err=5) 01/24/16 14:57:03
> GetCursorInfo() failed (err=5)
> 
> _______________________________________________ HTCondor-users
> mailing list To unsubscribe, send a message to
> htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You
> can also unsubscribe by visiting 
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/htcondor-users/
>