[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Jobs do not execute, they sit idle in the queue indefinitely



Brian,

Removing the nodeNN entries from loopback and restarting networking seems to have done the trick.  Thanks for the second set of eyes, I should have caught that earlier when going through the initial configuration!
Perhaps I need more coffee.

Regards,
Dan Shea

On 05/20/2013 03:11 PM, Dan Shea wrote:
On 05/20/2013 02:53 PM, Brian Candler wrote:
On Mon, May 20, 2013 at 02:38:31PM -0400, Dan Shea wrote:
Adding STARTD to the gatekeeper node caused all jobs queued to be
executed on the gatekeeper.
It seems the gatekeeper machine can not see the execute-only nodes?
I'm not sure what I have missed in the configuration to cause this
behaviour?  Network wise they all see each other just fine, hostnames
resolved via /etc/hosts entries.
Have you set ALLOW_WRITE, if so to what?

Currently, I am attempting to limit things to the local network, perhaps this is not the correct way to wildcard a subnet?

ALLOW_WRITE = 10.11.114.*


        
SchedLog:05/17/13 13:41:21 (pid:9037) WARNING: forward resolution of
localhost.localdomain doesn't match 10.11.114.220!
This does look like a problem. What does "hostname" show on all the nodes?
Do you have a "localhost.localdomain" entry in /etc/hosts? Normally it would
be for 127.0.0.1, don't be tempted to set it to the external IP of your
machine.

hostname will return node00 - node09 depending upon which node you are on.  /etc/hosts localhost.localdomain entry has not been modified, it still points to loopback.  I think I do see the issue however.

127.0.0.1   node00 localhost localhost.localdomain
10.11.114.220 node00

Thanks Brian, let me correct the /etc/hosts entries and see if it fixes things a bit.

Regards,
Dan


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


-- 
Dan Shea - daniel_shea2@xxxxxxxxxxxxxxx
Senior Systems Administrator, West Quad Computing Group
Harvard Medical School
"Charlie was a chemist, But Charlie is no more. For what he thought was H2O, Was H2SO4."


-- 
Dan Shea - daniel_shea2@xxxxxxxxxxxxxxx
Senior Systems Administrator, West Quad Computing Group
Harvard Medical School
"Charlie was a chemist, But Charlie is no more. For what he thought was H2O, Was H2SO4."