[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] What should I do to connect machines?




The following line in your StartLog indicates that the central manager IP address 192.168.0.109 is not reachable from the machine running the startd:

01/11 18:36:36 attempt to connect to <192.168.0.109:9618> failed: No route to host (connect errno = 113).

So either fix things so that this IP address is accessible or use a different IP address for your collector (one that is accessible from all of the other machines).

In recent versions of condor (all 7.X), condor listens by default to all network interfaces, so unless you want it to _not_ listen for connections on a particular IP, you shouldn't need to set NETWORK_INTERFACE. You just need to set CONDOR_HOST and/or COLLECTOR_HOST to an IP address of the central manager that is accessible.

--Dan

Dan Bradley wrote:
Genie,

It takes up to UPDATE_INTERVAL (default 5 minutes) for machines to report to the central manager.

If machines do not show up after that amount of time, the first place to look is in CollectorLog on the central manager. Look for "permission denied" to see if the collector is rejecting connections from the other machines. If that is the problem, then change your authorization policy (ALLOW_WRITE).

If that is not the problem, then look in StartLog on one of the machines that is not showing up in condor_status. See if there are any errors reported when the startd tries to connect to the collector.

--Dan

Genie Jhang wrote:
Thanks a lot for the helps all of you gave to me.
But, I found out there is another problem at the very beginnig of the configuration. Our lab. has 9 server computers, so I installed condor all the computers 1-8 have ip address from 192.168.0.101 to 192.168.0.108, respectively. 9 has one public ip and one private ip 192.168.0.109. So, 9 has installed condor with options "--prefix=/condor --local-dir=/home/condor --type=execute,submit,manager --owner=condor" and the others, 1-8, have installed condor with options "--prefix=/condor --local-dir=/home/condor --type=execute,submit --central-manager=192.168.0.109 --owner=condor". But, after I start condor_master at 9, then type condor_status, there isn't any other machines. It shows me only the below. ------------------------------------- Name OpSys Arch State Activity LoadAv Mem ActvtyTime slot1@pheko09 <mailto:slot1@pheko09> LINUX INTEL Owner Idle 1.000 506 0+00:00:07 slot2@pheko09 <mailto:slot2@pheko09> LINUX INTEL Owner Idle 0.050 506 0+00:00:08 slot3@pheko09 <mailto:slot3@pheko09> LINUX INTEL Owner Idle 0.000 506 0+00:00:09 slot4@pheko09 <mailto:slot4@pheko09> LINUX INTEL Owner Idle 0.000 506 0+00:00:10 slot5@pheko09 <mailto:slot5@pheko09> LINUX INTEL Owner Idle 0.000 506 0+00:00:11 slot6@pheko09 <mailto:slot6@pheko09> LINUX INTEL Owner Idle 0.000 506 0+00:00:12 slot7@pheko09 <mailto:slot7@pheko09> LINUX INTEL Owner Idle 0.000 506 0+00:00:13 slot8@pheko09 <mailto:slot8@pheko09> LINUX INTEL Owner Idle 0.000 506 0+00:00:06 Total Owner Claimed Unclaimed Matched Preempting Backfill INTEL/LINUX 8 8 0 0 0 0 0 Total 8 8 0 0 0 0 0
-------------------------------------
What should I do to connect other machines to 9? Please help me. Thanks you for reading this.
------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/