[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Quick Start Vanilla Condor on Ubuntu 10.04
- Date: Thu, 10 Nov 2011 16:02:52 -0500
- From: Daniel Grollman <daniel.grollman@xxxxxxxxx>
- Subject: Re: [Condor-users] Quick Start Vanilla Condor on Ubuntu 10.04
Yes, you read my problem correctly. I never did figure out what was
going on, as after a reboot, this issue went away.
Thanks again for all of your help.
On 11/10/2011 08:15 AM, Matthew Farrellee wrote:
Hopefully I read your problem correctly - jobs submitted from HOST that
run on HOST get held failing to open a /net/home/user file, all other
jobs (submitted from HOST or not) running on (HOST or not) succeed.
Setting FILESYSTEM_DOMAIN to the same value across nodes means that they
have a shared filesystem, which the jobs will use. Setting UID_DOMAIN to
the same value means that the same users exist across machines (same
name,uid,gids). If either of those things are not true you can get some
You should verify the UID_DOMAIN is set correctly. Check the StartLog
and StarterLog.slot1 on HOST to see what user the job is being started
as (you may need STARTER_DEBUG=D_FULLDEBUG in config). See if that user
differs on the non-HOSTs.
I'd normally think about rootsquash, but you said only HOST on HOST jobs
I guess you could also verify you have the privs to make/open that file
outside of condor.
On 11/10/2011 04:24 AM, Lukas Slebodnik wrote:
If you want upgrade to newer version of condor using apt-get, then you
try to install condor from Condor Debian Repository managed by Condor
team. I don't know how it is compatible with ubuntu, but you can try
it and then
On Wed, Nov 09, 2011 at 05:56:47PM -0500, Daniel Grollman wrote:
Hi Matt (and all),
Thanks for the response, it totally pointed me in the right
direction, which was the filesystem. As it's shared, I had to
change the UID_DOMAIN and FILESYSTEM_DOMAIN configuration
parameters, and it all worked.
Well, almost. I've three computers in my pool now, one host and two
submit/execute machines. If I submit jobs from either of the
non-host computers, they get farmed out across all three, and all is
However, when I submit jobs from the host, they get farmed out, and
only those on the NON-host machines actually run. The others get
held with this message:
user@HOST:~/condor_test$ condor_q -analyze
-- Submitter: HOST :<127.0.1.1:35783> : HOST
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
012.003: Request is held.
Hold reason: Error from starter on slot1@HOST: Failed to open
'/net/home/user/condor_test/simple.3.out' as standard output:
Permission denied (errno 13)
I can resolve this by making those files world-writable, but doesn't
seem correct. Thoughts?
Also, I'm using 7.2.4 because it's what came down via apt-get. I'll
look into upgrading.
On 11/08/2011 10:21 PM, Matthew Farrellee wrote:
On 11/08/2011 06:27 PM, Daniel Grollman wrote:
Is there a quick start guide for getting condor up and running on a
small ubuntu 10.04 pool? I just want to run processes on other
idle processors (vanilla universe).
Here's where I'm at if anyone can help:
2 identical (virtual) machines with fresh installs of Ubuntu 10.04
Condor 7.2.4 installed via 'apt-get install condor'
At this point both machines have their own local condors, and I can
queue and run jobs, no problem.
I edited the /etc/condor/condor_config files thusly:
On machine 1:
CONDOR_HOST = [IP address of machine 2]
HOSTALLOW_READ = *
HOSTALLOW_WRITE = *
On machine 2:
HOSTALLOW_READ = *
HOSTALLOW_WRITE = *
After a reboot (?) condor_status on either machine shows me the
both machines and if they're busy/idle/etc (yay!). However, they still
seem to have different queues. I.e, when I submit from machine 1, I
see it in condor_q on machine 1, and it only runs on the cpu of
1 (but I see the usage in condor_status on machine 2).
I imagine there's a configuration parameter I need to set
I don't know what. Help please?
You probably want ShouldTransferFiles = IF_NEEDED& WhenToTransferOutput
= ON_EXIT in your submit file.
7.2.4 is very old at this point, can you upgrade?
Here are some instructions you can follow, they're for Fedora, but if
you pretend apt is yum and, with 7.2.4, you throw everything in
~condor/condor_config.local instead of /etc/condor/config.d, everything
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at:
Cambridge Research Laboratory
Vecna Technologies, Inc.
36 Cambridge Park Drive
Cambridge, MA 02140
Phone: (617) 864-0636
Fax: (617) 864-0638
Better Technology, Better World (TM)
The contents of this message may be privileged and confidential.
Therefore, if this message has been received in error, please delete it
without reading it. Your receipt of this message is not intended to
waive any applicable privilege. Please do not disseminate this message
without the permission of the author.