[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor install and set up problems



Thanks for the response. I took another route.

I un-installed the Centos 5 rpm's on the master and execute machines.

I loaded the tar.gz Centos condor file and ran the install similarly to the article mentioned BUT the master is master and submit only, the execute node is execute and submit only.

As we build the cluster up we will add more execute only nodes.

When I rebooted the execute node and restarted condor via condor_master it showed up using condor_status with all the correct info. I did this because the log file you mentioned showed some permission errors and I fixed those, then took the code in condor.sh and added it to the user .bash_profile file and rebooted to fix the path issues.

So.... thanks for getting me the right places to look into and the rest seems to be getting permissions, paths and config.local pointers correct.

Hopefully the developer can now build the web based job submission front end we need to allow our scientists to submit jobs on our heterogeneous cluster of compute servers.

Thanks Much!

Seth Bardash

Integrated Solutions and Systems LLC

719-495-5866   Shop Phone
719-337-4779   Cell

seth@xxxxxxxxxxxxxxxxxxxxxxx
Failure cannot survive knowledge and perseverance!



On 9/8/2010 9:51 AM, Edier Alberto Zapata Hernández wrote:
Hi, Some questions / suggests:

   1.
      The master is submit too?
      Ok I just see the artitle you told, some words about it:
        Set up a Master, Execute, Submit machine (all in one) is the
      worse case to use Condor.
        In my personal experience, when the Master can run jobs It will
      overload itself (because the 1st rank machine to run a task will
      be itself always).
      My preffer configuration is: 1 Master, submit machine and many
      execute or execute, submit nodes.
   2. When you run: *condor_status -any* what output you get?
      *condor_status -any* shows a list of the "daemons" found in the
      cluster, you want to see if there is some nodes appart of the
      master in that list.
   3. In the Master and the execute nodes, check the
      CONDOR_HOME/local.HOSTNAME/log/MasterLog file.
      If Condor found something wrong at the start, It will record it there.
      You should give a look to the other Log Files in that folder too.
   4. If there is no fail in any of the log Files... check if the
      Firewall is running, some firewalls block Condor's ports, so It
      can't connect between Master and nodes.

I hope this could help you.
I'll wait your answer.

On Tue, Sep 7, 2010 at 10:28 AM, Seth Bardash
<seth@xxxxxxxxxxxxxxxxxxxxxxx <mailto:seth@xxxxxxxxxxxxxxxxxxxxxxx>> wrote:

    Please be patient as I'm new to condor.

    I am trying to set up a simple condor 7.5.3 cluster for testing so
    we can write a complete web based front end and use it to control 24
    units of 12 core machines running mixed O/S's of Windows 2008 Server
    64 bit and Centos Linux 64 bit. But first I need to just get it
    working and seeing the execute machines:

    1 Master - Centos 5.5 i386 based with dual Xeons and 12 GB of memory

    1 Slave, execute machine, Centos 5.5 x86_64 based Dual core Opteron
    280 with 2 GB memory

    1 Slave execute machine, Windows 2003 server with dual xeons and 4
    GB memory

    All are on the same subnet.

    So far I have the Centos machines loaded per the article on
    linux.com <http://linux.com>
    http://www.linux.com/archive/articles/56747. Downloaded the RHEL 5
    tar.gz and have run condor_configure per this article. The master
    sees its own 2 cores but does not see the execute machine. The
    execute machine see the 2 cores on the master but that's it.

    I have read some of the manual, especially the install Unix and
    install windows parts but clearly I am doing something wrong.

    Most of us linux people do not speak condor. Any reading or areas to
    investigate to set up a small condor cluster would be helpful. Our
    only hope is to keep machines busy running native code on a machine
    and have a central submit machine that can monitor the various machines.

    A little direct help or even suggestions on where to look would be
    appreciated.

    Thanks

    --
    Seth Bardash

    Integrated Solutions and Systems LLC

    719-495-5866   Shop Phone
    719-337-4779   Cell

    seth@xxxxxxxxxxxxxxxxxxxxxxx <mailto:seth@xxxxxxxxxxxxxxxxxxxxxxx>
    Failure cannot survive knowledge and perseverance!
    _______________________________________________
    Condor-users mailing list
    To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx> with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://lists.cs.wisc.edu/mailman/listinfo/condor-users

    The archives can be found at:
    https://lists.cs.wisc.edu/archive/condor-users/



--
----
Edier Alberto Zapata Hernández
Est. Ingeniería de Sistemas
Universidad de Valle