[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor install and set up problems



Hi, Some questions / suggests:
  1. The master is submit too?
    Ok I just see the artitle you told, some words about it:
     Set up a Master, Execute, Submit machine (all in one) is the worse case to use Condor. 
     In my personal experience, when the Master can run jobs It will overload itself (because the 1st rank machine to run a task will be itself always).
    My preffer configuration is: 1 Master, submit machine and many execute or execute, submit nodes.
  2. When you run: condor_status -any what output you get?
    condor_status -any shows a list of the "daemons" found in the cluster, you want to see if there is some nodes appart of the master in that list.
  3. In the Master and the execute nodes, check the CONDOR_HOME/local.HOSTNAME/log/MasterLog file.
    If Condor found something wrong at the start, It will record it there.
    You should give a look to the other Log Files in that folder too.
  4. If there is no fail in any of the log Files... check if the Firewall is running, some firewalls block Condor's ports, so It can't connect between Master and nodes.
I hope this could help you.
I'll wait your answer.

On Tue, Sep 7, 2010 at 10:28 AM, Seth Bardash <seth@xxxxxxxxxxxxxxxxxxxxxxx> wrote:

Please be patient as I'm new to condor.

I am trying to set up a simple condor 7.5.3 cluster for testing so we can write a complete web based front end and use it to control 24 units of 12 core machines running mixed O/S's of Windows 2008 Server 64 bit and Centos Linux 64 bit. But first I need to just get it working and seeing the execute machines:

1 Master - Centos 5.5 i386 based with dual Xeons and 12 GB of memory

1 Slave, execute machine, Centos 5.5 x86_64 based Dual core Opteron 280 with 2 GB memory

1 Slave execute machine, Windows 2003 server with dual xeons and 4 GB memory

All are on the same subnet.

So far I have the Centos machines loaded per the article on linux.com
http://www.linux.com/archive/articles/56747. Downloaded the RHEL 5 tar.gz and have run condor_configure per this article. The master sees its own 2 cores but does not see the execute machine. The execute machine see the 2 cores on the master but that's it.

I have read some of the manual, especially the install Unix and install windows parts but clearly I am doing something wrong.

Most of us linux people do not speak condor. Any reading or areas to investigate to set up a small condor cluster would be helpful. Our only hope is to keep machines busy running native code on a machine and have a central submit machine that can monitor the various machines.

A little direct help or even suggestions on where to look would be appreciated.

Thanks

--
Seth Bardash

Integrated Solutions and Systems LLC

719-495-5866   Shop Phone
719-337-4779   Cell

seth@xxxxxxxxxxxxxxxxxxxxxxx
Failure cannot survive knowledge and perseverance!
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



--
----
Edier Alberto Zapata Hernández
Est. Ingeniería de Sistemas
Universidad de Valle