[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] help setting up condor



Hi Edier,
The machines have Scientific Linux 6. They already have a hostname assigned. 
I added the ip addresses and hostname’s as you said to the /etc/hosts file of both machines and restarted the network and condor, but no luck yet. Should I reboot?

In case it helps, below is the tail of the CollectorLog in the HOST which shows some interaction with the worker node (137.138.46.173 is the HOST, 137.138.46.175 is the worker).

Jose

02/24/15 16:23:37 Got QUERY_STARTD_PVT_ADS
02/24/15 16:23:37 Number of Active Workers 0
02/24/15 16:23:37 (Sending 12 ads in response to query)
02/24/15 16:23:37 Query info: matched=12; skipped=0; query_time=0.001560; send_time=0.001414; type=MachinePrivate; requirements={true}; peer=<137.138.46.173:56275>; projection={}
02/24/15 16:23:37 Number of Active Workers 0
02/24/15 16:23:37 (Sending 13 ads in response to query)
02/24/15 16:23:37 Query info: matched=13; skipped=3; query_time=0.001705; send_time=0.005531; type=Any; requirements={( ( ( MyType == "Scheduler" ) || ( MyType == "Submitter" ) ) || ( ( MyType == "Machine" ) ) )}; peer=<137.138.46.173:40891>; projection={}
02/24/15 16:24:13 Got QUERY_STARTD_ADS
02/24/15 16:24:13 Number of Active Workers 0
02/24/15 16:24:13 (Sending 12 ads in response to query)
02/24/15 16:24:13 Query info: matched=12; skipped=0; query_time=0.002586; send_time=0.003795; type=Machine; requirements={true}; peer=<137.138.46.173:51220>; projection={Name Machine Opsys Arch State Activity LoadAvg Memory ActvtyTime MyCurrentTime EnteredCurrentActivity}
02/24/15 16:24:17 Got QUERY_STARTD_ADS
02/24/15 16:24:17 Number of Active Workers 0
02/24/15 16:24:17 (Sending 12 ads in response to query)
02/24/15 16:24:17 Query info: matched=12; skipped=0; query_time=0.001537; send_time=0.003603; type=Machine; requirements={true}; peer=<137.138.46.175:51775>; projection={Name Machine Opsys Arch State Activity LoadAvg Memory ActvtyTime MyCurrentTime EnteredCurrentActivity}
02/24/15 16:24:37 Got QUERY_STARTD_PVT_ADS
02/24/15 16:24:37 Number of Active Workers 0
02/24/15 16:24:37 (Sending 12 ads in response to query)
02/24/15 16:24:37 Query info: matched=12; skipped=0; query_time=0.001502; send_time=0.001359; type=MachinePrivate; requirements={true}; peer=<137.138.46.173:51743>; projection={}
02/24/15 16:24:37 Number of Active Workers 0
02/24/15 16:24:37 (Sending 13 ads in response to query)
02/24/15 16:24:37 Query info: matched=13; skipped=3; query_time=0.001733; send_time=0.005509; type=Any; requirements={( ( ( MyType == "Scheduler" ) || ( MyType == "Submitter" ) ) || ( ( MyType == "Machine" ) ) )}; peer=<137.138.46.173:48643>; projection={}


On Feb 24, 2015, at 3:33 PM, Edier Zapata <edalzap@xxxxxxxxx> wrote:

Ok, there's a problem, HTCondor needs a "real" hostname to connect nodes in the pool, so:

1. Which Operating System are you using? Linux Ubuntu, Linux CentOS, Scientific Linux?

2. You need to assign a hostname to each node (maybe: testmaster.test.cern.ch and testnode.test.cern.ch)

3. I assume both are in a LAN, so in the /etc/hosts you have to add the LAN IP of each one in this way:
assumming LAN's IP are 192.168.1.2 (testmaster) and 192.168.1.3 (testnode), the /etc/hosts in both should look like this:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
192.168.1.2 testmaster.test.cern.ch testmaster
192.168.1.3 testnode.test.cern.ch testnode
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

If you're using ubuntu, the hostname can be change editing the /etc/hostname file, if you're using Redhat's like Linux (CentOS, SL, etc), the hostname is in: /etc/sysconfig/network
and restarting the node.

Bye

On Tue, Feb 24, 2015 at 9:18 AM, Jose Feliciano Benitez <Jose.Benitez@xxxxxxx> wrote:
Hi Edier,
Thanks for those instructions.

I’m not familiar with setting DNS/FQDN, can you tell me exactly how to do step #3.
This is what /etc/hosts  has on both machines:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

Jose

On Feb 24, 2015, at 2:59 PM, Edier Zapata <edalzap@xxxxxxxxx> wrote:

As I understand you set up two One Node Pools, and now you want to change one of the One node to a Execute node, I'm right?

Assuming I'm right:
1. You should change the DAEMONS in your condor_config file, and remove all but STARTD and MASTER
2. Change the CONDOR_HOST in the Execute node so it points to the Master node.
3. If you don't have a DNS, add in the /etc/hosts file of both nodes the FQDN of the Two nodes (so you can access by name to from one to another)
4. Restart Execute node.

I think with this you should get it working.

Good day :)

On Tue, Feb 24, 2015 at 6:20 AM, Jose Feliciano Benitez <Jose.Benitez@xxxxxxx> wrote:
Hi,
I’m new to condor. We have a few machines we want to use as a cluster. I installed condor on two machines so far for testing. I’ve tried running condor on each machine individually and it seems to run fine. When I do condor_status it gives me a list of the cpus in the machine.

Then I tried to connect the machines by setting the  HOST parameter in one machine to the name of the other machine. In this case when I do: condor_status it shows only the cpus of the HOST but not of the worker. Both the host and the worker give the same output. I’ve not been able to figure out why the host seems not to be able to detect the cpus of the worker.

Any help would be appreciated.

Thanks,
Jose



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Edier Alberto Zapata Hernández
Ingeniero de Soporte en Infraestructura
CIER - Sur

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Edier Alberto Zapata Hernández
Ingeniero de Soporte en Infraestructura
CIER - Sur

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/