[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Getting started with 2 nodes



FYI, http://spinningmatt.wordpress.com/2011/06/12/getting-started-creating-a-multiple-node-condor-pool/ was written using Fedora, but hopefully isn't very different for ubuntu (use apt vs yum).

If you find the instructions don't work, let me know and I'll see if I can update them.

Best,


matt

On 06/11/2012 12:53 PM, Brian Candler wrote:
I'm trying to get started with Condor, following the instructions at
http://research.cs.wisc.edu/condor/manual/v7.8/3_2Installation.html

I have two machines, which have each others' names in /etc/hosts:
10.26.1.224	dev-storage1.example.com
10.26.1.226	dev-storage2.example.com

I have created a 'condor' user on both:
useradd -m condor -s /bin/false

These machines are running ubuntu 12.04 x86_64 server, so I have downloaded
condor-7.8.0-x86_64_deb_6.0-stripped.tar.gz

I want one machine to be manager+submit+execute and the other just to be
submit+execute, so I installed each like this:

./condor_install --install-dir=/opt/condor-7.8.0 --local-dir=/home/condor --type=submit,execute,manager --central-manager=dev-storage1.example.com

./condor_install --install-dir=/opt/condor-7.8.0 --local-dir=/home/condor --type=submit,execute --central-manager=dev-storage1.example.com

On both:
ln -s condor-7.8.0 /opt/condor
. /opt/condor/condor.sh
/opt/condor/sbin/sbin/condor_master

I see what seem to be the right processes on the two boxes(*)

However, condor_status on either box shows only slots on box 1:

root@dev-storage2:~# condor_status

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@dev-storage1 LINUX      X86_64 Owner     Idle     0.290  1991  0+00:00:28
slot2@dev-storage1 LINUX      X86_64 Unclaimed Idle     0.000  1991  0+00:00:05
slot3@dev-storage1 LINUX      X86_64 Unclaimed Idle     0.000  1991  0+00:00:30
slot4@dev-storage1 LINUX      X86_64 Unclaimed Idle     0.000  1991  0+00:00:31
                      Total Owner Claimed Unclaimed Matched Preempting Backfill

         X86_64/LINUX     4     1       0         3       0          0        0

                Total     4     1       0         3       0          0        0
root@dev-storage2:~# condor_status dev-storage1.example.com

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@dev-storage1 LINUX      X86_64 Owner     Idle     0.290  1991  0+00:00:28
slot2@dev-storage1 LINUX      X86_64 Unclaimed Idle     0.000  1991  0+00:00:05
slot3@dev-storage1 LINUX      X86_64 Unclaimed Idle     0.000  1991  0+00:00:30
slot4@dev-storage1 LINUX      X86_64 Unclaimed Idle     0.000  1991  0+00:00:31
                      Total Owner Claimed Unclaimed Matched Preempting Backfill

         X86_64/LINUX     4     1       0         3       0          0        0

                Total     4     1       0         3       0          0        0
root@dev-storage2:~# condor_status dev-storage2.example.com
root@dev-storage2:~#

Any suggestions as to what I should be looking at to get dev-storage2
available?

Thanks,

Brian.

(*)
root@dev-storage1:~# ps auxwww | grep condor_ | grep -v grep
condor   11460  0.1  0.0  98620  5396 ?        Ss   17:47   0:00 /opt/condor/sbin/condor_master
condor   11461  0.1  0.0  99920  7264 ?        Ss   17:47   0:00 condor_collector -f
condor   11462  0.0  0.0  97556  6784 ?        Ss   17:47   0:00 condor_negotiator -f
condor   11463  0.0  0.0  99128  7668 ?        Ss   17:47   0:00 condor_schedd -f
condor   11464  0.1  0.0  98220  7548 ?        Ss   17:47   0:00 condor_startd -f
root     11465  0.0  0.0  23480  2088 ?        S    17:47   0:00 condor_procd -A /tmp/condor-lock.dev-storage10.678420392637239/procd_pipe.SCHEDD -L /home/condor/log/ProcLog.SCHEDD -R 10000000 -S 60 -C 1002

root@dev-storage2:~# ps auxwww | grep condor_ | grep -v grep
condor   10462  0.0  0.0  98572  5312 ?        Ss   17:47   0:00 /opt/condor/sbin/condor_master
condor   10463  0.0  0.0  99128  7668 ?        Ss   17:47   0:00 condor_schedd -f
condor   10464  0.1  0.0  98220  7548 ?        Ss   17:47   0:00 condor_startd -f
root     10465  0.0  0.0  23480  2052 ?        S    17:47   0:00 condor_procd -A /tmp/condor-lock.dev-storage20.222976692793058/procd_pipe.SCHEDD -L /home/condor/log/ProcLog.SCHEDD -R 10000000 -S 60 -C 1002



Here is a diff between /home/condor/condor_config.local on the two boxes:

--- cl1	2012-06-11 17:50:24.000000000 +0100
+++ cl2	2012-06-11 17:50:20.000000000 +0100
@@ -21,7 +21,7 @@
  ##  When something goes wrong with condor at your site, who should get
  ##  the email?

-CONDOR_ADMIN = root@xxxxxxxxxxxxxxxxxxxxxxxx
+CONDOR_ADMIN = root@xxxxxxxxxxxxxxxxxxxxxxxx


  ##  Full path to a mail delivery program that understands that "-s"
@@ -50,9 +50,9 @@
  ##  you've set in the CONDOR_IDS environment variable.  See the Admin
  ##  manual for details on this.

-LOCK = /tmp/condor-lock.$(HOSTNAME)0.678420392637239
+LOCK = /tmp/condor-lock.$(HOSTNAME)0.222976692793058

-DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD
+DAEMON_LIST = MASTER, SCHEDD, STARTD


  ##  Network domain parameters:
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/