[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Adding Windows (7) machine to pool



Good afternoon George,

   I havent seen your config files, but there are a few interesting things you have indicated.

If you are having domain name problems, try setting master with IP, not name.domain.
Secondly, in your config filesyou have a daemons part, which indicates which daemons will start on that machine.

Make sure the windows machines dont have master listed, only startd anc schedd. Im mentioning these off the top of my head, so please check in condor manual.

Sometimes the nodes wont appear listed because of different domains. Please set both configs with a specific domain to check.

Working tip: After each change, please restart condor to make sure services are started, wait a couple of minutes, and check log files to see what is blocking you. So that you dont make more changes than necessary.

Good luck, 

   David Lisin


2013/6/5 Dunn, George Jr <dunng@xxxxxxxx>

OK part of my problem may be name resolution. I changed the first entry in resolv.conf to point to our local WINS servers that seem to have the correct resolution for the machines I am targeting.

 

I ran condor_q –analyze on the windows node and got the following:

Error: can’t find address for <windows machine AD DNS name>

Extra Info: You probably saw this error because the condor_schedd is not

running on the machine you are trying to query. If the condor_schedd is not

running, the Condor system will not be able to find an address and port to

connect to and satisfy this request. Please make sure the Condor daemons are

running and try again.

Extra Info: If the condor_schedd is running on the machine you are trying to

query and you still see the error, the most likely cause is that you have

setup a personal Condor, you have not defined SCHEDD_NAME in your

condor_config file, and something is wrong with your SCHEDD_ADDRESS_FILE

setting. You must define either or both of those settings in your config

file, or you must use the -name option to condor_q. Please see the Condor

manual for details on SCHEDD_NAME and SCHEDD_ADDRESS_FILE.

 

 

 

 

Furthurmore it appears as if the windows machine thinks it’s a master (It has its own IP address in the .master_address file. Even though when I run condor_status on it I get the slots from the master listed.

 

I specifically told it on the install to join a pool and I have done it ona couple of different machines with the same results.

 

 

At this point I am at a loss and pretty confused.

 

How can I configure the windows condor to NOT think it is a master? (please see a couple of threads back for the install options I used)

 

Maybe then I can tackle the name resolution problem (if it still exists I can now resolve the windows node from linux)

 

Thanks for the help!

 

 

From: Dunn, George Jr
Sent: Tuesday, June 04, 2013 5:29 PM
To: HTCondor-Users Mail List
Subject: RE: [HTCondor-users] Adding Windows (7) machine to pool

 

Thanks!

I restarted the service (I have firewalls turned off on both machines at this point). The node still did not show up. I ran condor_startd and it opened serveral cmd windows and closed all but one that shows no text. Waited the two minutes still nothing.

 

I will provide what logs seem relevant if I am missing something I can provide:

 

On the new WINDOWS node here is what I have in the MasterLog.

Seems kinda weird that it should have condor_master start up but like I said it shows the masters’s execute slots when I run condor_status on the windows machine.

 

06/04/13 17:22:53 ** condor (CONDOR_MASTER) STARTING UP

06/04/13 17:22:53 ** C:\condor\bin\condor_master.exe

06/04/13 17:22:53 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)

06/04/13 17:22:53 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON

06/04/13 17:22:53 ** $CondorVersion: 7.8.8 Mar 20 2013 BuildID: 110288 $

06/04/13 17:22:53 ** $CondorPlatform: x86_64_winnt_6.1 $

06/04/13 17:22:53 ** PID = 2756

06/04/13 17:22:53 ** Log last touched 6/4 16:22:52

06/04/13 17:22:53 ******************************************************

06/04/13 17:22:53 Using config source: C:\condor\condor_config

06/04/13 17:22:53 Using local config sources:

06/04/13 17:22:53    C:\condor/condor_config.local

06/04/13 17:22:53 DaemonCore: command socket at <x.x.x.x:51435>

06/04/13 17:22:53 DaemonCore: private command socket at <x.x.x.x:51435>

06/04/13 17:22:53 Setting maximum accepts per cycle 8.

06/04/13 17:22:54 Started DaemonCore process "C:\condor/bin/condor_startd.exe", pid and pgroup = 184

06/04/13 17:22:54 Started DaemonCore process "C:\condor/bin/condor_kbdd.exe", pid and pgroup = 5376

 

Here is the MasterLog on the linux master:

 

06/03/13 15:39:13 ******************************************************

06/03/13 15:39:13 ** condor_master (CONDOR_MASTER) STARTING UP

06/03/13 15:39:13 ** /usr/local/condor/sbin/condor_master

06/03/13 15:39:13 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)

06/03/13 15:39:13 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON

06/03/13 15:39:13 ** $CondorVersion: 7.8.8 Mar 20 2013 BuildID: 110288 $

06/03/13 15:39:13 ** $CondorPlatform: x86_64_rhap_6.3 $

06/03/13 15:39:13 ** PID = 1535

06/03/13 15:39:13 ** Log last touched 6/3 15:36:34

06/03/13 15:39:13 ******************************************************

06/03/13 15:39:13 Using config source: /usr/local/condor/etc/condor_config

06/03/13 15:39:13 Using local config sources:

06/03/13 15:39:13    /home/condor/condor_config.local

06/03/13 15:39:13 DaemonCore: command socket at <152.20.244.221:43548>

06/03/13 15:39:13 DaemonCore: private command socket at <152.20.244.221:43548>

06/03/13 15:39:13 Setting maximum accepts per cycle 8.

06/03/13 15:39:13 Started DaemonCore process "/usr/local/condor/sbin/condor_collector", pid and pgroup = 1536

06/03/13 15:39:13 Waiting for /home/condor/log/.collector_address to appear.

06/03/13 15:39:14 Found /home/condor/log/.collector_address.

06/03/13 15:39:14 Started DaemonCore process "/usr/local/condor/sbin/condor_negotiator", pid and pgroup = 1537

06/03/13 15:39:14 Started DaemonCore process "/usr/local/condor/sbin/condor_schedd", pid and pgroup = 1538

06/03/13 15:39:14 Started DaemonCore process "/usr/local/condor/sbin/condor_startd", pid and pgroup = 1539

06/03/13 16:39:13 Preen pid is 1716

06/04/13 16:39:13 Preen pid is 4958


Back to the WINDOWS machine

When the service is running I see the following files listed in the c:\condor\logs directory:

 

06/04/2013  04:33 PM               114 .kbdd_address

06/04/2013  04:33 PM               114 .master_address

06/04/2013  05:12 PM               114 .startd_address

06/04/2013  05:12 PM                78 .startd_claim_id.slot1

06/04/2013  05:12 PM                78 .startd_claim_id.slot2

06/04/2013  05:12 PM                78 .startd_claim_id.slot3

06/04/2013  05:21 PM            10,838 KbdLog

06/04/2013  05:21 PM                 0 list.txt

06/04/2013  05:21 PM            13,371 MasterLog

06/04/2013  05:12 PM             5,895 StarterLog

06/04/2013  05:21 PM            27,976 StartLog

06/03/2013  04:35 PM               600 TOOLLog

 

Which seems good but when I look in the .startd address I get

$CondorPlatform: x86_64_winnt_6.1 $

 

This is a 32bit OS. Is this a problem?

 

Thanks!

Eddie

 

 

From: htcondor-users-bounces@xxxxxxxxxxx [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of David Peter Lisin Crespo
Sent: Tuesday, June 04, 2013 5:04 PM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] Adding Windows (7) machine to pool

 

Hi George. Can you restart cóndor on your win7 machine (net stop cóndor and then net start condor). Wait 2 min and  do cóndor_Status.

If the slots dont appear, go to cóndor binarys and execute cóndor_startd.

If this doesnt work please  append logs.

Good luck!!

El 04/06/2013 22:56, "Dunn, George Jr" <dunng@xxxxxxxx> escribió:

Maybe the windows machines are not supposed to show up in condor_status?

any any event  I have a follow-up that might be a better option. I came across the coLinux stuff and am intrigued. My only reservation is that it seems this project is kindof inactive. Is there something that is more recently updated for running linux inside of windows that I am missing?

I guess I could use cygwin or mingw along the same lines but I like the idea of having a "real" linux OS to work with.

Can anyone please shed some light on my ignorance?

Thanks!
Eddie


________________________________________
From: htcondor-users-bounces@xxxxxxxxxxx [htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Dunn, George Jr [dunng@xxxxxxxx]
Sent: Monday, June 03, 2013 3:57 PM
To: HTCondor-Users Mail List
Subject: [HTCondor-users] Adding Windows (7) machine to pool

Hi I have a basic one node linux condor pool. I want to add a single windows machine. I installed the msi (same version as master  7.8.8) with the following: (from here http://research.cs.wisc.edu/htcondor/manual/v7.8/3_2Installation.html#SECTION00425200000000000000)





STEP 2:

Set existing pool with FQDN of master.



STEP 3:

Always Run Jobs and never suspend



STEP 7:

Host permissions * on all for troubleshooting





I also set



   TRUST_UID_DOMAIN = True

   SOFT_UID_DOMAIN = True

On the windows machine to match my master node.

When I run a condor_status on either the new node or the master I just get the slots listed for the master.

Should windows machines show up in condor_status listing?

Right now I want to use this just to run matlab and R jobs on windows. Seems like a fairly simple undertaking but appareantly not for me!! ☺


I would love to find a more in depth step by step with more gotchas and examples for different types of environments. Does one exists perhaps a book that I might be able to purchase?

Thanks!
Eddie





_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/