[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] setting up condor pool



Hi Salman,
 I just read your mails jeje, maybe this clues help you about your
pool's problems. (I assume you're using Linux):
1. Do you have a DNS server? and your machines (the pool) registered
in it? If you don't, add the FQDN of all the nodes (Master, and
Execute) to the /etc/hosts file. Condor require "Named" access to the
nodes. Try ping the nodes using name not IP.

2. Verify your execute nodes have Swap enable, ( type free, the last
line say: Swap XXXXX YYYY ZZZZ), if all of them are Zero, you don't
have it, enable it or add this line to the condor_config.local:
RESERVED_SWAP = 0

 Here a little read about it:
 http://research.cs.wisc.edu/condor/manual/v7.6/2_6Managing_Job.html#3453

3. Check the output of condor_status, if you don't get output then
your execute nodes are not finding the Master and viceversa. You can
use condor_status -any to list the demons in the pool, the Machine Job
lines are execute nodes, again if you don't see any, there's a
communication problem between Master and Nodes.

4. If you're using a Firewall check if it is not blocking the Condor's
ports, turn it off and try condor_status again.

5. Run condor_q -analyze or condor_q -better-analyze, and if you can
paste the output here so we can give a better clue about where the
problem may be.

6. Verify your Execute nodes are not being "used" for a human. Condor
will not run jobs if the node is being used (mouse or keyboard
activity in the last 15 minutes). You can avoid this adding:
 START = TRUE to your condor_config.local file.

Hope this help you.
Bye.

On 2/15/12, Pervez, Salman <Salman.Pervez@xxxxxxxxxxxxxxxx> wrote:
> Great, I've got a condor pool now. I tried submitting some jobs to it, and
> they don't seem to be executing. Condor_q just shows their status as
> submitted. This was not the case when I had a 1 machine pool. The jobs would
> execute right away. Any idea why there is a delay here? Or where I might
> look?
>
> Salman
>
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Thomas Luff
> Sent: Wednesday, February 15, 2012 11:53 AM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] setting up condor pool
>
> You need to set the CONDOR_HOST attribute in your condor_config file.
>
> As this is going to be the same for all machines in your pool you want to
> set this value the global config file and not the local file.
>
> Once this is done you might find that your machines are not allowed to talk
> to each other because of condor's default host authorization. This is a
> basic authorization used by condor to control access based on hostnames. To
> make sure everything is running just allow read/write to anyone.
>
> ALLOW_READ = *
> ALLOW_WRITE= *
> ALLOW_NEGOTIATOR = $(CONDOR_HOST)
>
> Then you can go back later and fine tune the host authorization or switch to
> a more secure method.
>
> ________________________________________
> From: condor-users-bounces@xxxxxxxxxxx [condor-users-bounces@xxxxxxxxxxx] On
> Behalf Of Pervez, Salman [Salman.Pervez@xxxxxxxxxxxxxxxx]
> Sent: 15 February 2012 17:10
> To: condor-users
> Subject: [Condor-users] setting up condor pool
>
> Hi everyone, thanks for all the help so far (Thomas!). I can now start up
> Condor daemons on 3 of my machines which is great. The problem is they don't
> seem to know about each other. Which made me realize that I never actually
> told my execute machines how to get in touch with the master. Can someone
> tell me how to do this? I have a master set up as a submit machine and two
> others that are just execute machines. Appreciate the input. Thanks!
>
> Salman
>
> -- IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium.  Thank you.
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>


-- 
Edier Alberto Zapata Hernández
Ingeniero de Sistemas
Universidad de Valle