[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Quick Start Vanilla Condor on Ubuntu 10.04



Hi Matt (and all),

Thanks for the response, it totally pointed me in the right direction, which was the filesystem. As it's shared, I had to change the UID_DOMAIN and FILESYSTEM_DOMAIN configuration parameters, and it all worked.

Well, almost. I've three computers in my pool now, one host and two submit/execute machines. If I submit jobs from either of the non-host computers, they get farmed out across all three, and all is dandy.

However, when I submit jobs from the host, they get farmed out, and only those on the NON-host machines actually run. The others get held with this message:

user@HOST:~/condor_test$ condor_q -analyze
-- Submitter: HOST : <127.0.1.1:35783> : HOST
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
---
012.003:  Request is held.

Hold reason: Error from starter on slot1@HOST: Failed to open '/net/home/user/condor_test/simple.3.out' as standard output: Permission denied (errno 13)

I can resolve this by making those files world-writable, but doesn't seem correct. Thoughts?

Also, I'm using 7.2.4 because it's what came down via apt-get. I'll look into upgrading.

Thanks again,

Dan

On 11/08/2011 10:21 PM, Matthew Farrellee wrote:
On 11/08/2011 06:27 PM, Daniel Grollman wrote:
Hello Condor-users,

Is there a quick start guide for getting condor up and running on a
small ubuntu 10.04 pool? I just want to run processes on other machine's
idle processors (vanilla universe).

Here's where I'm at if anyone can help:

2 identical (virtual) machines with fresh installs of Ubuntu 10.04 with
Condor 7.2.4 installed via 'apt-get install condor'

At this point both machines have their own local condors, and I can
queue and run jobs, no problem.

I edited the /etc/condor/condor_config files thusly:

On machine 1:
CONDOR_HOST = [IP address of machine 2]
HOSTALLOW_READ = *
HOSTALLOW_WRITE = *

On machine 2:
HOSTALLOW_READ = *
HOSTALLOW_WRITE = *

After a reboot (?) condor_status on either machine shows me the slots on
both machines and if they're busy/idle/etc (yay!). However, they still
seem to have different queues. I.e, when I submit from machine 1, I only
see it in condor_q on machine 1, and it only runs on the cpu of machine
1 (but I see the usage in condor_status on machine 2).

I imagine there's a configuration parameter I need to set somewhere, but
I don't know what. Help please?

Thanks,

Dan

You probably want ShouldTransferFiles = IF_NEEDED & WhenToTransferOutput
= ON_EXIT in your submit file.

https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2281

7.2.4 is very old at this point, can you upgrade?

Here are some instructions you can follow, they're for Fedora, but if
you pretend apt is yum and, with 7.2.4, you throw everything in
~condor/condor_config.local instead of /etc/condor/config.d, everything
should work.

http://spinningmatt.wordpress.com/2011/06/12/getting-started-creating-a-multiple-node-condor-pool/


http://spinningmatt.wordpress.com/2011/06/21/getting-started-multiple-node-condor-pool-with-firewalls/


http://spinningmatt.wordpress.com/2011/07/04/getting-started-submitting-jobs-to-condor/


Best,


matt


--
Dan Grollman
Robot Doctor
daniel.grollman@xxxxxxxxx
http://www.vecna.com/robotics

Cambridge Research Laboratory
Vecna Technologies, Inc.
36 Cambridge Park Drive
Cambridge, MA 02140
Phone: (617) 864-0636
Fax: (617) 864-0638

Better Technology, Better World (TM)

The contents of this message may be privileged and confidential. Therefore, if this message has been received in error, please delete it without reading it. Your receipt of this message is not intended to waive any applicable privilege. Please do not disseminate this message without the permission of the author.