[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Installation problems in a future Grid testbed



Try sending a test job - something simple like a shell script - before
concluding that the machines can't communicate. This one should work.

uname.sh
-------------------------
#!/bin/bash

# Print the machine we ran on
uname -n
-------------------------

uname.cmd
-------------------------
universe     = vanilla
executable = uname.sh
output        = uname.out
error          = uname.err
log            = uname.log
queue
-------------------------

If that doesn't work we are getting out of my realm of experience. I
spent the last 2 weeks setting up a couple condor installs for my
company, but I haven't run in to the problem you described.

-Avi


On 8/26/05, Fabiano Portella <fabiano_portella@xxxxxxxxxxxx> wrote:
> Yes, it was my mistake. In fact, when I copied the global condor_config from
> manager to the executer, I didn't replace the LOCAL_DIR and
> LOCAL_CONFIG_FILE macros.
> After changed that I could start condor_master on executer. So, I issued
> process status and I got the following:
>  
> [globus@crithidia sbin]$ ps -ef | egrep condor_
> globus   11189     1  0 10:27 ?        00:00:00 condor_master
> globus   11190 11189  1 10:27 ?        00:00:04 condor_startd -f
> globus   11191 11189  0 10:27 ?        00:00:00 condor_schedd -f
> globus   11223 10911  0 10:34 pts/0    00:00:00 egrep condor_
> That means that I finally succeed this problem (executer just start its
> daemons). But... :)
> The condor_status gives me the following in manager:
>  
> [globus@vivax condor-6.7.10]$ condor_status
> Name            OpSys      Arch   State      Activity   LoadAv  Mem 
> ActvtyTime
> vivax.biowebd LINUX       INTEL  Owner    Idle         2.840    757   
> 0+01:00:19
>                      Machines Owner Claimed Unclaimed Matched Preempting
> INTEL/LINUX               1     1       0         0       0          0
> Total                           1     1       0         0       0          0
> This seems to me that manager only see itself.
> When I issued the same command in executer I got:
>  
> [globus@crithidia sbin]$ condor_status
> Error:  Could not fetch ads --- can't find collector
> It seems that they still don't see each other. Is there a way to do this
> without a NFS?
> Thanks in advance.
> Regards,
> Fabiano.
> 
> Avi Flamholz <flamholz@xxxxxxxxx> escreveu:
> Try giving the local user permissions on the local config file. It
> sounds like it cant read it.
> 
> On 8/26/05, Fabiano Portella wrote:
> > Hi Avi. 
> > I've tried to follow your suggestion. 
> > Suppose manager is machine 1 and submitter/executer is machine 2. 
> > I've replaced the condor_config file in machine 2 with the condor_config
> > from machine 1 (they are equal). 
> > I've also replaced condor_config.local in machine 2 to an empty file
> (touch
> > condor_config.local). 
> > After that, I got the error when I run condor_master in machine 2 (machine
> 1
> > was already monitoring through condor_master): 
> > ======================================================= 
> > [globus@crithidia sbin]$ sudo condor_master 
> > ERROR: Can't read config file
> > /usr/local/condor-.7.10/local.vivax/condor_config.local
> > ======================================================= 
> > So I tried to disable the REQUIRE_LOCAL_CONFIG_FILE macro
> > in machine 2 before rerun condor_master and I got this error: 
> > ======================================================= 
> > [globus@crithidia sbin]$ sudo condor_master
> > Can't open "/usr/local/condor-6.7.10/local.vivax/log/MasterLog"
> > 8/26 09:29:47 dprintf() had a fatal error in pid 10984
> > Can't open "/usr/local/condor-6.7.10/local.vivax/log/MasterLog"
> > errno: 2 (No such file or directory)
> > euid: 504, ruid: 0
> > ======================================================= 
> > Any suggestion to solve this? I really don't have a NFS installed so I'm
> > concerned about how submitters/executers will find manager if they don't
> > have a local configuration file telling where to find it. 
> > Thanks for your time and fast he! lp. I think we are close to the
> solution. 
> > Regards, 
> > Fabiano. 
> > 
> > Avi Flamholz escreveu: 
> > You should have a global configuration file on all the machines, and
> > they should be identical. You must have a local configuration file on
> > all the machines if the REQUIRE_LOCAL_CONFIG_FILE macro in the global
> > configuration file is set to true, which it is by default. However,
> > that local config file can be empty on machines that are not the
> > central manager.
> > 
> > (You are not using NFS, right? That seems to be the root of most of
> > your installation problems.)
> > 
> > -Avi
> > 
> > On 8/25/05, Fabiano Portella wrote:
> > > So, you're meaning that I must have only one global
> > > config file in central manager and a local config file
> > > with data too. All others machines in the pools
> > > (submitters/executers) must have ONLY one e! mpty local
> > > config file. Is that correct?
> > > Please let me know if I'm wrong.> Thanks one more time for your help.
> > > Regards,
> > > Fabiano.
> > > 
> > > --- Avi Flamholz escreveu:
> > > 
> > > > Each machine must have a local config file, but it
> > > > need not have
> > > > anything in it. You should empty out the local
> > > > config files for the
> > > > machines that you do not want to be the central
> > > > manager. You do not
> > > > want to have an empty global config file - how else
> > > > would you define
> > > > global settings for condor? You should undo that if
> > > > possible, or
> > > > reinstall.
> > > >
> > > > I believe, also, that the condor_config script will
> > > > set up the
> > > > appropriate local config files for you if you run it
> > &! gt; > with the correct
> 
> > > > parameters.
> > > >
> > > > -Avi
> > > >
> > > > On 8/24/05, Fabiano Portella
> > > > wrote:
> > > > > Thanks for the fast response Avi!
> > > > > I'm sorry about the confusion! You're right: I'm
> > > > having 2 managers instead
> > > > > of 1.
> > > > > But I couldn't understand your point about the
> > > > configuration files: which
> > > > > must be an empty file (condor_config ou
> > > > condor_config.local)?
> > > > > I've tried to do this tip with condor_config in
> > > > the non-manager machine:
> > > > > 1. Replace condor_config with an empty file in the
> > > > non-manager machine of
> > > > > the pool
> > > > > 2. Start condor_master in pool manager
> > > > > 3. Start condor_master in pool non-manager (just
> > > > submitter/executer)
> > > > > But seems that none happened (no daemons turned
> > > > on).
> > > > > =============================================
> > > > > [globus@cri! thidia sbin]$ condor_master
> > > > > [globus@crithidia sbin]$ ps -ef | egrep condor_
> > > > > globus 6639 6585 0 23:47 pts/0 00:00:00
> > > > egrep condor_
> > > > > [globus@crithidia sbin]$
> > > > > =============================================
> > > > > I assume that an empty condor_config.local is not
> > > > the case, according to the
> > > > > condor documentation (each machine must have a
> > > > local configuration file).
> > > > > Regards,
> > > > > Fabiano.
> > > > >
> > > > >
> > > > > Avi Flamholz escreveu:
> > 
> > > > > Do you mean t! hat both machines are executing as a
> 
> > > > central manager?
> > > > > (You wrote "execute both machines as masters," but
> > > > the condor_master
> > > > > daemon is supposed to run on all machines, so I
> > > > assume you meant
> > > > > central manager.)
> > > > >
> > > > > If this is the case, you should look at the local
> > > > configuration files
> > > > > for the individual machines. It will probably have
> > > > a comment on the
> > > > > top saying something like "this is the config file
> > > > for the central
> > > > > manager." It will also probably have the macro
> > > > DAEMONS_LIST set to
> > > > > MASTER, NEGOTIATOR, COLLECTOR, STARTD, SCHEDD.
> > > > This means that all
> > > > > your machines are running all the daemons, which
> > > > is not what! you want.
> > > > > On the machines that you do not want to be central
> > > > managers, you
> > > > > should replace this file with an empty file. The
> > > > default, given an
> > > > > empty local config file, is that the condor_master
> > > > daemon will start
> > > > > the condor_star! td and condor_schedd daemons,
> > > > making the local machine
> > > > > a submit/execute node.
> > > > >
> > > > > -Avi
> > > > >
> > > > > On 8/24/05, Fabiano Portella wrote:
> > > > > >
> > > > > >
> > > > > > Hi Condor community!
> > > > > > I'm trying to create a Grid with machines from 2
> > > > research labs. First one
> > > > > > contains 3 Linux FC3 machines (one is the
> > > > central manager) and the other
> > > > > lab
> > > > > > contains 2 Linux FC1 machines.
> > > > > > I've just tried to install condor-6.7.10.
> > > > Following the condor
> > > > > > documentation, I've issued the following
> > > > commands (as globus user and sudo
> > > > > > permission):
> > > > > >
> > > > > >
> > > > > > $tar xzf
> > > > condor-6.7.10-linux-x86-glibc23-dynamic.tar.gz
> > > > > > $cd condor-6.7.10
> > > > >! ; > $sudo ./condor_configure
> > 
> > > > --type=manager,submit,execute
> > > > > > --install-dir=/usr/local/condor-6.7.10/
> > > > --owner=globus
> > > > > > --install
> > > > > >
> > > > > > WARNING: Unable to determine local IP address.
> > > > Condor might not work
> > > > > > propertly until you set NETWORK_INTERFACE=
> > > > > >
> > > > > > Use of uninitialized value in concatenation (.)
> > > > or string at
> > > > > > ./condor_configure line 908.
> > > > > >
> > > > > > Condor has been installed into:
> > > > > > /usr/local/condor-6.7.10
> > > > > >
> > > > > > It seems strange to me, since NETWORK_INTERFACE
> > > > was set to the IP address
> > > > > of
> > > > > > the specified machine.
> > > > > > Anyway, I continued the process, updating
> > > > condor_config and
> > > ! > > > condor_config.local properly:
> > > > > >
> > > > > > #######/etc/condor/condor_config#############
> > > > > > RELEASE_DIR = /usr/local/condor-6.7.10
> > > > > > LOCAL_DIR = /usr/local/condor-6.7.10/local.vivax! 
> > > > > > CONDOR_ADMIN = globus@xxxxxxxxxxxxxxxxxx
> > > > > > MAIL = /bin/mail
> > > > > > FULL_HOSTNAME = vivax.biowebdb.org
> > > > > > UID_DOMAIN = $(FULL_HOSTNAME)
> > > > > > FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
> > > > > > COLLECTOR_NAME = BioWebDB Pool
> > > > > > CONDOR_IDS = 504.504
> > > > > > QUEUE_SUPER_USERS = root, condor, globus
> > > > > > #############################################
> > > > > >
> > > > > &! gt;
> > > > >
> > > >
> > >
> >
> ##/usr/local/condor-6.7.10/local.vivax/condor_config.local###
> > > > > > CONDOR_HOST = vivax.biowebdb.org vivax
> > >! ; > > > CONDOR_ADMIN = globus@xxxxxxxxxxxxxxxxxx
> > 
> > > > > > UID_DOMAIN = $(FULL_HOSTNAME)
> > > > > > FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
> > > > > > CONDOR_IDS = 504.504
> > > > > >
> > > > >
> > > >
> > >
> >
> ##############################################################
> > > > > >
> > > > > > After that I decided to move forward to
> > > > install/configure Condor in other
> > > > > > machine. I was aware about the type parameter
> > > > for condor_install, so it
> > > > > was
> > > > > > "--type=submit,execute".
> > > > > > I don't have a shared file system nor a common
> > > > UID, so I changed
> > > > > > FULL_HOSTNAME to its name either.
> > > > > > The problems come now. After updates all
> > > > necessary fields in
> > > > > > condor_config.local and condor_config files for
> > > > both ! machines, I tried to
> > > > &gt! ; > start daemons.
> 
> > > > > > But both machines, after the "condor_master"
> > > > command issued in each one,
> > > > > > execute both machines as masters!
> > > > > > So, how could I dea! l with that? Is there any
> > > > configuration missing?
> > > > > What's
> > > > > > wrong? Should I reinstall all the stuff?
> > > > > > Please, any glue will be important! I really
> > > > need this feedback to go on!
> > > > > > Thanks in advance.
> > > > > > Regards,
> > > > > > Fabiano.
> > > > > >
> > > > > >
> > > > > >
> > > > __________________________________________________
> > > > > > Converse com seus amigos em tempo real com o
> > > > Yahoo! Messenger
> > > > > > http://br.download.yahoo.com/messenger/
> >! > > > > _______________________________________________
> > > > > > Condor-users mailing list
> > > > > > Condor-users@xxxxxxxxxxx
> > > > > >
> > > >
> > > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > > > > >
> > > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Condor-users mailing list
> > > >
> > > === message truncated ===
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > _______________________________________________________
> > > Yahoo! Acesso Grïtis - Internet rïpida e grïtis.
> > > Instale o discador agora! http://br.acesso.yahoo.com/
> > > _______________________________________________
> > > Condor-users mailing list
> > > Condor-users@xxxxxxxxxxx
> > > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > >
> > 
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > 
> > 
> > ________________________________
> > Yahoo! Acesso Grïtis: Internet rïpida e grïtis. Instale o discador agora! 
> > 
> > 
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > 
> >
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> 
> ________________________________
> Yahoo! Acesso Grïtis: Internet rïpida e grïtis. Instale o discador agora! 
> 
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
>