[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Installation problems in a future Grid testbed



First, make sure that crithidia is trying to talk to the right machine. Run 'condor_config_val condor_host'. Make sure that the resulting name is vivax's full hostname and you can ssh to vivax using it.

Next, make sure that Condor is binding the right network interface on both machines. All of the daemons write the following line to their log when they start up:
8/26 10:59:43 DaemonCore: Command Socket at <128.105.165.29:47348>

If you see 127.0.01 or if the machine has multiple IP addresses, you'll probably have to set NETWORK_INTERFACE in your config file to have Condor use the right address.

 -- Jaime

On Aug 26, 2005, at 8:41 AM, Fabiano Portella wrote:

Yes, it was my mistake. In fact, when I copied the global condor_config from manager to the executer, I didn't replace the LOCAL_DIR and LOCAL_CONFIG_FILE macros.
After changed that I could start condor_master on executer. So, I issued process status and I got the following:
 
[globus@crithidia sbin]$ ps -ef | egrep condor_
globus   11189     1  0 10:27 ?        00:00:00 condor_master
globus   11190 11189  1 10:27 ?        00:00:04 condor_startd -f
globus   11191 11189  0 10:27 ?        00:00:00 condor_schedd -f
globus   11223 10911  0 10:34 pts/0    00:00:00 egrep condor_
That means that I finally succeed this problem (executer just start its daemons). But... :)
The condor_status gives me the following in manager:
 
[globus@vivax condor-6.7.10]$ condor_status
Name            OpSys      Arch   State      Activity   LoadAv  Mem  ActvtyTime
vivax.biowebd LINUX       INTEL  Owner    Idle         2.840    757    0+01:00:19
                     Machines Owner Claimed Unclaimed Matched Preempting
INTEL/LINUX               1     1       0         0       0          0
Total                           1     1       0         0       0          0
This seems to me that manager only see itself.
When I issued the same command in executer I got:
 
[globus@crithidia sbin]$ condor_status
Error:  Could not fetch ads --- can't find collector
It seems that they still don't see each other. Is there a way to do this without a NFS?
Thanks in advance.
Regards,
Fabiano.

Avi Flamholz <flamholz@xxxxxxxxx> escreveu:
Try giving the local user permissions on the local config file. It
sounds like it cant read it.

On 8/26/05, Fabiano Portella wrote:
> Hi Avi.
> I've tried to follow your suggestion.
> Suppose manager is machine 1 and submitter/executer is machine 2.
> I've replaced the condor_config file in machine 2 with the condor_config
> from machine 1 (they are equal).
> I've also replaced condor_config.local in machine 2 to an empty file (touch
> condor_config.local).
> After that, I got the error when I run condor_master in machine 2 (machine 1
> was already monitoring through condor_master):
> =======================================================
> [globus@crithidia sbin]$ sudo condor_master
> ERROR: Can't read config file
> /usr/local/condor-.7.10/local.vivax/condor_config.local
> =======================================================
> So I tried to disable the REQUIRE_LOCAL_CONFIG_FILE macro
> in machine 2 before rerun condor_master and I got this error:
> =======================================================
> [globus@crithidia sbin]$ sudo condor_master
> Can't open "/usr/local/condor-6.7.10/local.vivax/log/MasterLog"
> 8/26 09:29:47 dprintf() had a fatal error in pid 10984
> Can't open "/usr/local/condor-6.7.10/local.vivax/log/MasterLog"
> errno: 2 (No such file or directory)
> euid: 504, ruid: 0
> =======================================================
> Any suggestion to solve this? I really don't have a NFS installed so I'm
> concerned about how submitters/executers will find manager if they don't
> have a local configuration file telling where to find it.
> Thanks for your time and fast he! lp. I think we are close to the solution.
> Regards,
> Fabiano.
>
> Avi Flamholz escreveu:
> You should have a global configuration file on all the machines, and
> they should be identical. You must have a local configuration file on
> all the machines if the REQUIRE_LOCAL_CONFIG_FILE macro in the global
> configuration file is set to true, which it is by default. However,
> that local config file can be empty on machines that are not the
> central manager.
>
> (You are not using NFS, right? That seems to be the root of most of
> your installation problems.)
>
> -Avi
>
> On 8/25/05, Fabiano Portella wrote:
> > So, you're meaning that I must have only one global
> > config file in central manager and a local config file
> > with data too. All others machines in the pools
> > (submitters/executers) must have ONLY one e! mpty local
> > config file. Is that correct?
> > Please let me know if I'm wrong.> Thanks one more time for your help.
> > Regards,
> > Fabiano.
> >
> > --- Avi Flamholz escreveu:
> >
> > > Each machine must have a local config file, but it
> > > need not have
> > > anything in it. You should empty out the local
> > > config files for the
> > > machines that you do not want to be the central
> > > manager. You do not
> > > want to have an empty global config file - how else
> > > would you define
> > > global settings for condor? You should undo that if
> > > possible, or
> > > reinstall.
> > >
> > > I believe, also, that the condor_config script will
> > > set up the
> > > appropriate local config files for you if you run it
> &! gt; > with the correct
> > > parameters.
> > >
> > > -Avi
> > >
> > > On 8/24/05, Fabiano Portella
> > > wrote:
> > > > Thanks for the fast response Avi!
> > > > I'm sorry about the confusion! You're right: I'm
> > > having 2 managers instead
> > > > of 1.
> > > > But I couldn't understand your point about the
> > > configuration files: which
> > > > must be an empty file (condor_config ou
> > > condor_config.local)?
> > > > I've tried to do this tip with condor_config in
> > > the non-manager machine:
> > > > 1. Replace condor_config with an empty file in the
> > > non-manager machine of
> > > > the pool
> > > > 2. Start condor_master in pool manager
> > > > 3. Start condor_master in pool non-manager (just
> > > submitter/executer)
> > > > But seems that none happened (no daemons turned
> > > on).
> > > > =============================================
> > > > [globus@cri! thidia sbin]$ condor_master
> > > > [globus@crithidia sbin]$ ps -ef | egrep condor_
> > > > globus 6639 6585 0 23:47 pts/0 00:00:00
> > > egrep condor_
> > > > [globus@crithidia sbin]$
> > > > =============================================
> > > > I assume that an empty condor_config.local is not
> > > the case, according to the
> > > > condor documentation (each machine must have a
> > > local configuration file).
> > > > Regards,
> > > > Fabiano.
> > > >
> > > >
> > > > Avi Flamholz escreveu:
>
> > > > Do you mean t! hat both machines are executing as a
> > > central manager?
> > > > (You wrote "execute both machines as masters," but
> > > the condor_master
> > > > daemon is supposed to run on all machines, so I
> > > assume you meant
> > > > central manager.)
> > > >
> > > > If this is the case, you should look at the local
> > > configuration files
> > > > for the individual machines. It will probably have
> > > a comment on the
> > > > top saying something like "this is the config file
> > > for the central
> > > > manager." It will also probably have the macro
> > > DAEMONS_LIST set to
> > > > MASTER, NEGOTIATOR, COLLECTOR, STARTD, SCHEDD.
> > > This means that all
> > > > your machines are running all the daemons, which
> > > is not what! you want.
> > > > On the machines that you do not want to be central
> > > managers, you
> > > > should replace this file with an empty file. The
> > > default, given an
> > > > empty local config file, is that the condor_master
> > > daemon will start
> > > > the condor_star! td and condor_schedd daemons,
> > > making the local machine
> > > > a submit/execute node.
> > > >
> > > > -Avi
> > > >
> > > > On 8/24/05, Fabiano Portella wrote:
> > > > >
> > > > >
> > > > > Hi Condor community!
> > > > > I'm trying to create a Grid with machines from 2
> > > research labs. First one
> > > > > contains 3 Linux FC3 machines (one is the
> > > central manager) and the other
> > > > lab
> > > > > contains 2 Linux FC1 machines.
> > > > > I've just tried to install condor-6.7.10.
> > > Following the condor
> > > > > documentation, I've issued the following
> > > commands (as globus user and sudo
> > > > > permission):
> > > > >
> > > > >
> > > > > $tar xzf
> > > condor-6.7.10-linux-x86-glibc23-dynamic.tar.gz
> > > > > $cd condor-6.7.10
> > > >! ; > $sudo ./condor_configure
>
> > > --type=manager,submit,execute
> > > > > --install-dir=/usr/local/condor-6.7.10/
> > > --owner=globus
> > > > > --install
> > > > >
> > > > > WARNING: Unable to determine local IP address.
> > > Condor might not work
> > > > > propertly until you set NETWORK_INTERFACE=
> > > > >
> > > > > Use of uninitialized value in concatenation (.)
> > > or string at
> > > > > ./condor_configure line 908.
> > > > >
> > > > > Condor has been installed into:
> > > > > /usr/local/condor-6.7.10
> > > > >
> > > > > It seems strange to me, since NETWORK_INTERFACE
> > > was set to the IP address
> > > > of
> > > > > the specified machine.
> > > > > Anyway, I continued the process, updating
> > > condor_config and
> > ! > > > condor_config.local properly:
> > > > >
> > > > > #######/etc/condor/condor_config#############
> > > > > RELEASE_DIR = /usr/local/condor-6.7.10
> > > > > LOCAL_DIR = /usr/local/condor-6.7.10/local.vivax!
> > > > > CONDOR_ADMIN = globus@xxxxxxxxxxxxxxxxxx
> > > > > MAIL = /bin/mail
> > > > > FULL_HOSTNAME = vivax.biowebdb.org
> > > > > UID_DOMAIN = $(FULL_HOSTNAME)
> > > > > FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
> > > > > COLLECTOR_NAME = BioWebDB Pool
> > > > > CONDOR_IDS = 504.504
> > > > > QUEUE_SUPER_USERS = root, condor, globus
> > > > > #############################################
> > > > >
> > > > &! gt;
> > > >
> > >
> >
> ##/usr/local/condor-6.7.10/local.vivax/condor_config.local###
> > > > > CONDOR_HOST = vivax.biowebdb.org vivax
> >! ; > > > CONDOR_ADMIN = globus@xxxxxxxxxxxxxxxxxx
>
> > > > > UID_DOMAIN = $(FULL_HOSTNAME)
> > > > > FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
> > > > > CONDOR_IDS = 504.504
> > > > >
> > > >
> > >
> >
> ##############################################################
> > > > >
> > > > > After that I decided to move forward to
> > > install/configure Condor in other
> > > > > machine. I was aware about the type parameter
> > > for condor_install, so it
> > > > was
> > > > > "--type=submit,execute".
> > > > > I don't have a shared file system nor a common
> > > UID, so I changed
> > > > > FULL_HOSTNAME to its name either.
> > > > > The problems come now. After updates all
> > > necessary fields in
> > > > > condor_config.local and condor_config files for
> > > both ! machines, I tried to
> > > >! ; > start daemons.
> > > > > But both machines, after the "condor_master"
> > > command issued in each one,
> > > > > execute both machines as masters!
> > > > > So, how could I dea! l with that? Is there any
> > > configuration missing?
> > > > What's
> > > > > wrong? Should I reinstall all the stuff?
> > > > > Please, any glue will be important! I really
> > > need this feedback to go on!
> > > > > Thanks in advance.
> > > > > Regards,
> > > > > Fabiano.
> > > > >
> > > > >
> > > > >
> > > __________________________________________________
> > > > > Converse com seus amigos em tempo real com o
> > > Yahoo! Messenger
> > > > > http://br.download.yahoo.com/messenger/
>! > > > > _______________________________________________
> > > > > Condor-users mailing list
> > > > > Condor-users@xxxxxxxxxxx
> > > > >
> > >
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > > > >
> > > > >
> > > >
> > > > _______________________________________________
> > > > Condor-users mailing list
> > >
> > === message truncated ===
> >
> >
> >
> >
> >
> >
> > _______________________________________________________
> > Yahoo! Acesso Gr�tis - Internet r�pida e gr�tis.
> > Instale o discador agora! http://br.acesso.yahoo.com/
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>
> ________________________________
> Yahoo! Acesso Gr�tis: Internet r�pida e gr�tis. Instale o discador agora!
>
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users


Yahoo! Acesso Grátis: Internet rápida e grátis. Instale o discador agora!
_______________________________________________
Condor-users mailing list

+----------------------------------+---------------------------------+

|            Jaime Frey            |  Public Split on Whether        |

|        jfrey@xxxxxxxxxxx         |  Bush Is a Divider              |

|  http://www.cs.wisc.edu/~jfrey/  |         -- CNN Scrolling Banner |

+----------------------------------+---------------------------------+