[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor install and set up problems



You're welcome :)

2010/9/8 Seth Bardash <seth@xxxxxxxxxxxxxxxxxxxxxxx>
Thanks for the response. I took another route.

I un-installed the Centos 5 rpm's on the master and execute machines.

I loaded the tar.gz Centos condor file and ran the install similarly to the article mentioned BUT the master is master and submit only, the execute node is execute and submit only.

As we build the cluster up we will add more execute only nodes.

When I rebooted the execute node and restarted condor via condor_master it showed up using condor_status with all the correct info. I did this because the log file you mentioned showed some permission errors and I fixed those, then took the code in condor.sh and added it to the user .bash_profile file and rebooted to fix the path issues.

So.... thanks for getting me the right places to look into and the rest seems to be getting permissions, paths and config.local pointers correct.

Hopefully the developer can now build the web based job submission front end we need to allow our scientists to submit jobs on our heterogeneous cluster of compute servers.

Thanks Much!


Seth Bardash

Integrated Solutions and Systems LLC

719-495-5866   Shop Phone
719-337-4779   Cell

seth@xxxxxxxxxxxxxxxxxxxxxxx
Failure cannot survive knowledge and perseverance!



On 9/8/2010 9:51 AM, Edier Alberto Zapata Hernández wrote:
Hi, Some questions / suggests:

  1.

     The master is submit too?
     Ok I just see the artitle you told, some words about it:
       Set up a Master, Execute, Submit machine (all in one) is the
     worse case to use Condor.
       In my personal experience, when the Master can run jobs It will
     overload itself (because the 1st rank machine to run a task will
     be itself always).
     My preffer configuration is: 1 Master, submit machine and many
     execute or execute, submit nodes.
  2. When you run: *condor_status -any* what output you get?

     *condor_status -any* shows a list of the "daemons" found in the
     cluster, you want to see if there is some nodes appart of the
     master in that list.
  3. In the Master and the execute nodes, check the

     CONDOR_HOME/local.HOSTNAME/log/MasterLog file.
     If Condor found something wrong at the start, It will record it there.
     You should give a look to the other Log Files in that folder too.
  4. If there is no fail in any of the log Files... check if the

     Firewall is running, some firewalls block Condor's ports, so It
     can't connect between Master and nodes.

I hope this could help you.
I'll wait your answer.

On Tue, Sep 7, 2010 at 10:28 AM, Seth Bardash
<seth@xxxxxxxxxxxxxxxxxxxxxxx <mailto:seth@xxxxxxxxxxxxxxxxxxxxxxx>> wrote:

   Please be patient as I'm new to condor.

   I am trying to set up a simple condor 7.5.3 cluster for testing so
   we can write a complete web based front end and use it to control 24
   units of 12 core machines running mixed O/S's of Windows 2008 Server
   64 bit and Centos Linux 64 bit. But first I need to just get it
   working and seeing the execute machines:

   1 Master - Centos 5.5 i386 based with dual Xeons and 12 GB of memory

   1 Slave, execute machine, Centos 5.5 x86_64 based Dual core Opteron
   280 with 2 GB memory

   1 Slave execute machine, Windows 2003 server with dual xeons and 4
   GB memory

   All are on the same subnet.

   So far I have the Centos machines loaded per the article on
   linux.com <http://linux.com>

   http://www.linux.com/archive/articles/56747. Downloaded the RHEL 5
   tar.gz and have run condor_configure per this article. The master
   sees its own 2 cores but does not see the execute machine. The
   execute machine see the 2 cores on the master but that's it.

   I have read some of the manual, especially the install Unix and
   install windows parts but clearly I am doing something wrong.

   Most of us linux people do not speak condor. Any reading or areas to
   investigate to set up a small condor cluster would be helpful. Our
   only hope is to keep machines busy running native code on a machine
   and have a central submit machine that can monitor the various machines.

   A little direct help or even suggestions on where to look would be
   appreciated.

   Thanks

   --
   Seth Bardash

   Integrated Solutions and Systems LLC

   719-495-5866   Shop Phone
   719-337-4779   Cell

   seth@xxxxxxxxxxxxxxxxxxxxxxx <mailto:seth@xxxxxxxxxxxxxxxxxxxxxxx>

   Failure cannot survive knowledge and perseverance!
   _______________________________________________
   Condor-users mailing list
   To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
   <mailto:condor-users-request@xxxxxxxxxxx> with a

   subject: Unsubscribe
   You can also unsubscribe by visiting
   https://lists.cs.wisc.edu/mailman/listinfo/condor-users

   The archives can be found at:
   https://lists.cs.wisc.edu/archive/condor-users/



--
----
Edier Alberto Zapata Hernández
Est. Ingeniería de Sistemas
Universidad de Valle




--
----
Edier Alberto Zapata Hernández
Est. Ingeniería de Sistemas
Universidad de Valle