[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] attach mac host to linux main server



Hi,

Thanks for both your input!

In the end, what I figured out was that the central host had a firewall running. I knew this, but also thought that the ports required were already open (and saw no warning/error messages regarding communication/ports/etc).

A lot of ports were open (e.g. all ports 1000-10 000), which includes e.g. 9618. However, it looks like HT Condor by default is using higher port ranges (I saw 58 000 something in MasterLog as an example) as well. Finally I decided I donât get network settings and we are anyway inside an internal network, so I simply turned off iptables.

The Mac is now connecting fine, so this was not an OSX specific problem. I am however slightly disappointed that there was no information about the communication problem in the logfiles? Is this something which could be improved upon?

Kind Regards,
Yngve

> On 24 Nov 2014, at 22:39, Jim White <jimwhite@xxxxxx> wrote:
> 
> Hi Yngve,
> 
> I know very little about HTCondor admin but I did recently get a multi-node configuration working, albeit by disabling pretty much all security (which will get adding back later if/when it becomes necessary but mostly not an issue because of Docker).  The setup is for Kubernetes but you may get some ideas for what to try from my config.local:
> 
> https://github.com/jimwhite/condor-kubernetes
> 
> Pretty sure email doesn't work which is something I would like to have
> 
> I got some inspiration from these blog posts:
> 
> http://spinningmatt.wordpress.com/2011/06/21/getting-started-multiple-node-condor-pool-with-firewalls/
> http://spinningmatt.wordpress.com/2011/06/12/getting-started-creating-a-multiple-node-condor-pool/
> http://www.isi.edu/~gideon/condor-ec2/
> http://sagarg55.wordpress.com/2014/02/09/configure-ht-condor-on-ubuntu/
> 
> I recommend reading all of the log files and making sure there aren't any error messages you don't understand.  The "VM-gahp server reported an internal error" is benign (be nice if the message made that clear).
> 
> Oh, and you don't say whether you've turned off your firewalls, but that would be the first thing to do if not.
> 
> I hope there's something useful in all that.  Please do share your solution when you get there.
> 
> Jim
> 
> On Mon, Nov 24, 2014 at 7:31 AM, Yngve Levinsen <Yngve.Levinsen@xxxxxxx> wrote:
> Hi all,
> 
> First of all a warning, I am quite new to HT Condor, so this might just be me not knowing where to look for information. Related links are most welcome!
> 
> I have successfully set up HT Condor on our workstation running Scientific Linux 6 (based on RHEL 6), using the yum repositories. This was a very easy procedure, so thanks to whomever maintains and builds this!
> 
> Now I am trying to attach my macbook (for testing, when I know how to, I want to attach with a few Mac Proâs we have in various offices), but I am not succeeding. I also find the explanations on how to attach two computers together in a cluster to be slightly confusing, perhaps because most of the time this is working very much automagically. For example, I did not quite get which machines needed write access where, and I did not quite understand how they communicate (other than port 9618 is used).
> 
> After failing for about half a day, I figured it is better to reach out to the community to see if someone else can suggest what I am doing wrong.
> 
> Second warning, I am also not very experienced with how OSX is structured, I am much more accustomed to Linux environment. This might very well be the root cause of my problems.
> 
> The workstation is simply set up using yum, I did not configure anything specifically. This seems to work fine, at least for the one machine. I tried to set a few things in /etc/condor/condor_config.local without much success. What do I need to configure on the host machine in order to allow other machines to be added to the pool? Currently I have only set âALLOW_WRITEâ to everything from inside our domain.
> 
> On the macbook, I installed from the tarball sources. using a separate user âcondor". I tried first to install locally with the command:
> $ ./condor_configure --install
> This seemed to work fine, I could see that my macbook had four open slots etc with âcondor_status".
> I then deleted all of that, and installed again with
> $ ./condor_configure --install --central-manager=<server hostname> --type=execute --verbose
> This did not really seem to work. With âcondor_statusâ I see the slots available on the server machine, but the slots from my macbook are never added.
> 
> My thought then was that there should be some errors written to the logfiles somewhere (on the mac):
> - In MasterLog everything seemed fine (to my eyes). The last line states that condor_startd is started as a daemoncore process.
> - Iâm not sure what I am expected to see in ProcLog, but I donât see anything suspicious
> - In StartLog it all looks good, I see that the four sltos are allocated, benchmarked, and currently idle according to this log.
> - StarterLog complains about ~condor/local.$HOST/config not existing, but I think that is just an optional folder for extra configuration files?
> 
> Next thought then, the server has some relevant information in the logfiles:
> - I grep for my macbook hostname in all files, nothing comes up. Same for my macbookâs IP address
> - I grep for warning (ignorecase), nothing comes up. I grep for error, I only get:
> $ grep -i error *
> CollectorLog:11/24/14 13:44:40 Daemon Log is logging: D_ALWAYS D_ERROR
> MasterLog:11/24/14 13:44:40 Daemon Log is logging: D_ALWAYS D_ERROR
> NegotiatorLog:11/24/14 13:44:41 Daemon Log is logging: D_ALWAYS D_ERROR D_MATCH
> SchedLog:11/24/14 13:44:41 (pid:19011) Daemon Log is logging: D_ALWAYS D_ERROR
> StartLog:11/24/14 13:44:41 Daemon Log is logging: D_ALWAYS D_ERROR
> StartLog:11/24/14 13:44:47 VM-gahp server reported an internal error
> 
> If I were to guess, I suspect the communication is not working properly (maybe the port is not correctly opened on my mac). However I am confused since I am not seeing any error messages anywhere. Further, âcondor_statusâ is correctly getting information about the server slots when executed from my laptop. I have tried various settings and configuration on both machines but nothing has gotten me closer to the solution.
> 
> Any suggestions to documentation on related issues, what I might do wrong, or questions about more details you need to help me are most welcome!
> 
> Thanks and cheers,
> Yngve
> 
> Some potentially useful information:
> 
> server$ uname -a
> Linux <hostname> 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue Sep 9 13:45:55 CDT 2014 x86_64 x86_64 x86_64 GNU/Linux
> macbook% uname -a
> Darwin <host> 14.0.0 Darwin Kernel Version 14.0.0: Fri Sep 19 00:26:44 PDT 2014; root:xnu-2782.1.97~2/RELEASE_X86_64 x86_64
> 
> Macbook is running Yosemite. Both machines are running HT Condor version 8.2.4. The server is 16 core, which gives a total of 32 slots. The laptop has two cores so four slots.
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/