[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] New installation of Condor : questions about configuration files and other general things
- Date: Thu, 19 Apr 2007 19:40:52 +0200
- From: Jan Ploski <Jan.Ploski@xxxxxxxx>
- Subject: Re: [Condor-users] New installation of Condor : questions about configuration files and other general things
Thomas Pegeot wrote:
I'm doing a work placement related to Condor, but as i'm a newbie in
Condor, i need your help. Actually, i have to put in place a desktop
grid composed of virtual machines running on dual-core computers
(Windows hosts running VirtualBox or VMWare Server virtual machines -
Debian Etch) and i want to use Condor to deal with these virtual machines.
I set up Condor-6.9.2
Condor-6.9.2 is a development release, maybe you should stick with the
> as a Central Manager (cm) on a linux server
(Debian Etch) running a NFS Server, which allows me to share the release
directory (/usr/local/condor in my case) and a unique home
(/home/condor/hosts/$hostname/) with the virtual machines (node1,node2
To install the central manager, i used the Older Unix Installation
Unfortunately, there is a lot of things i don't know how to deal with,
even with the Condor Manual.
First, about files sharing. Do you think my setup is correct ?
It is ok. AFAIK, apart from the lock files (which reside in local /tmp
by default), you can share everything over NFS.
On each virtual machine, i mount cm:/home/condor/ on /home/condor (i
have a condor user on each vm) but is it correct ?
Same thing for the
release directory : is it good to share this same folder between the
central manager and the nodes ?
After the installation, i tried to run condor_status but i only got the
status of my central manager, but there was nothing about my nodes. :s
How long have you waited? (A few minutes should be enough.)
When i run "condor_status -direct node1", "condor_status -direct node2"
..., i get the following error : "condor_status:can't find address for
So i checked which processes were running with ps -ef | egrep condor_ :
- on my cm :
- on my nodes : condor_master,condor_procd,startd,condor_sched
So, i think that my problem must be related to my configuration files.
It could also be your network configuration (firewall, routing etc.)
Here is my condor_config : http://pastebin.ca/448390
My cm condor_config.local : http://pastebin.ca/448397
I don't see any errors at first glance.
But the nodes condor_config.local are empty. I wonder whether it is
correct or not to have empty configurations files. :s
It is okay for the local configuration files to be empty. It simply
means that you do not override any option from the global configuration
Is another script to run on my virtual machines ?
I ran condor_init but it didn't fix my problem.
Next thing to do is inspect the logs of the cm and a chosen node:
Shut down Condor (e.g. by killing condor_master on cm and the node),
manually remove files from the log directory, then start the
condor_master process again and look at or paste messages that appear in