[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] attach mac host to linux main server



Hi Yngve,

I know very little about HTCondor admin but I did recently get a multi-node configuration working, albeit by disabling pretty much all security (which will get adding back later if/when it becomes necessary but mostly not an issue because of Docker). The setup is for Kubernetes but you may get some ideas for what to try from my config.local:

https://github.com/jimwhite/condor-kubernetes

Pretty sure email doesn't work which is something I would like to have

I got some inspiration from these blog posts:

http://spinningmatt.wordpress.com/2011/06/21/getting-started-multiple-node-condor-pool-with-firewalls/
http://spinningmatt.wordpress.com/2011/06/12/getting-started-creating-a-multiple-node-condor-pool/
http://www.isi.edu/~gideon/condor-ec2/
http://sagarg55.wordpress.com/2014/02/09/configure-ht-condor-on-ubuntu/

I recommend reading all of the log files and making sure there aren't any error messages you don't understand. The "VM-gahp server reported an internal error" is benign (be nice if the message made that clear).

Oh, and you don't say whether you've turned off your firewalls, but that would be the first thing to do if not.

I hope there's something useful in all that. Please do share your solution when you get there.

Jim

On Mon, Nov 24, 2014 at 7:31 AM, Yngve Levinsen <Yngve.Levinsen@xxxxxxx> wrote:
Hi all,

First of all a warning, I am quite new to HT Condor, so this might just be me not knowing where to look for information. Related links are most welcome!

I have successfully set up HT Condor on our workstation running Scientific Linux 6 (based on RHEL 6), using the yum repositories. This was a very easy procedure, so thanks to whomever maintains and builds this!

Now I am trying to attach my macbook (for testing, when I know how to, I want to attach with a few Mac Proâs we have in various offices), but I am not succeeding. I also find the explanations on how to attach two computers together in a cluster to be slightly confusing, perhaps because most of the time this is working very much automagically. For example, I did not quite get which machines needed write access where, and I did not quite understand how they communicate (other than port 9618 is used).

After failing for about half a day, I figured it is better to reach out to the community to see if someone else can suggest what I am doing wrong.

Second warning, I am also not very experienced with how OSX is structured, I am much more accustomed to Linux environment. This might very well be the root cause of my problems.

The workstation is simply set up using yum, I did not configure anything specifically. This seems to work fine, at least for the one machine. I tried to set a few things in /etc/condor/condor_config.local without much success. What do I need to configure on the host machine in order to allow other machines to be added to the pool? Currently I have only set âALLOW_WRITEâ to everything from inside our domain.

On the macbook, I installed from the tarball sources. using a separate user âcondor". I tried first to install locally with the command:
$ ./condor_configure --install
This seemed to work fine, I could see that my macbook had four open slots etc with âcondor_status".
I then deleted all of that, and installed again with
$ ./condor_configure --install --central-manager=<server hostname> --type=execute --verbose
This did not really seem to work. With âcondor_statusâ I see the slots available on the server machine, but the slots from my macbook are never added.

My thought then was that there should be some errors written to the logfiles somewhere (on the mac):
- In MasterLog everything seemed fine (to my eyes). The last line states that condor_startd is started as a daemoncore process.
- Iâm not sure what I am expected to see in ProcLog, but I donât see anything suspicious
- In StartLog it all looks good, I see that the four sltos are allocated, benchmarked, and currently idle according to this log.
- StarterLog complains about ~condor/local.$HOST/config not existing, but I think that is just an optional folder for extra configuration files?

Next thought then, the server has some relevant information in the logfiles:
- I grep for my macbook hostname in all files, nothing comes up. Same for my macbookâs IP address
- I grep for warning (ignorecase), nothing comes up. I grep for error, I only get:
$ grep -i error *
CollectorLog:11/24/14 13:44:40 Daemon Log is logging: D_ALWAYS D_ERROR
MasterLog:11/24/14 13:44:40 Daemon Log is logging: D_ALWAYS D_ERROR
NegotiatorLog:11/24/14 13:44:41 Daemon Log is logging: D_ALWAYS D_ERROR D_MATCH
SchedLog:11/24/14 13:44:41 (pid:19011) Daemon Log is logging: D_ALWAYS D_ERROR
StartLog:11/24/14 13:44:41 Daemon Log is logging: D_ALWAYS D_ERROR
StartLog:11/24/14 13:44:47 VM-gahp server reported an internal error

If I were to guess, I suspect the communication is not working properly (maybe the port is not correctly opened on my mac). However I am confused since I am not seeing any error messages anywhere. Further, âcondor_statusâ is correctly getting information about the server slots when executed from my laptop. I have tried various settings and configuration on both machines but nothing has gotten me closer to the solution.

Any suggestions to documentation on related issues, what I might do wrong, or questions about more details you need to help me are most welcome!

Thanks and cheers,
Yngve

Some potentially useful information:

server$ uname -a
Linux <hostname> 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue Sep 9 13:45:55 CDT 2014 x86_64 x86_64 x86_64 GNU/Linux
macbook% uname -a
Darwin <host> 14.0.0 Darwin Kernel Version 14.0.0: Fri Sep 19 00:26:44 PDT 2014; root:xnu-2782.1.97~2/RELEASE_X86_64 x86_64

Macbook is running Yosemite. Both machines are running HT Condor version 8.2.4. The server is 16 core, which gives a total of 32 slots. The laptop has two cores so four slots.

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/