Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] 2 match but reject the job for unknown reasons

Date: Wed, 9 Dec 2009 20:38:38 -0600 (CST)
From: Stephen Pietrowicz <srp@xxxxxxxxxxxxx>
Subject: [Condor-users] 2 match but reject the job for unknown reasons

Hi,

I've run into a problem that I'm trying to debug, but haven't come up with a clue to what might be going wrong.

I've set up the condor binaries on my own cluster, and submit a glide-in request to another system. This works. The nodes show up on my local cluster. I can then send vanilla universe condor jobs to them, and they execute. I can also send simple (one job) DAGs, and the job also executes.

What I haven't been been able to get to work is to get this working under a parallel universe. I've simplified this to the the "sleep" example (with "mydomain.org" pointing at my cluster's site):

universe = parallel
executable = /bin/sleep
arguments = 30
machine_count = 2
Requirements = target.Disk == 0 && TARGET.FileSystemDomain == "mydomain.org"
queue

On the nodes where this would execute, I have the following lines added to the generic "glidein_condor_config" file that comes with the distribution (I put these lines at the bottom of the file):

DEDICATEDSCHEDULER = "DedicatedScheduler@myusername@mylocalnode.mydomain.org"
STARTD_ATTRS = $(STARTD_ATTRS), DEDICATEDSCHEDULER

Everything else is a regular (vanilla - untouched) install, apart from the condor_config.local file changes I had to add to make sure it worked in the first place. I have the DAEMON_LIST set to:

DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, SHADOW

With all this in place, when the job tries to run, I get the message out of "condor_q -analyze" and "condor_q -better-analyze":

2 match but reject the job for unknown reasons

It appears that I'm missing a configuration parameter somewhere, either locally, or remotely. I've looked through the log files, and haven't seen why the job is being rejected. I've tried setting:

DEDICATEDSCHEDULER = "DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxxxx"

in the "glidein_condor_config" file on the execute nodes, but that doesn't appear to have made a difference either.

Can someone please point me to a LOG file I should be looking at or let me know a parameter I should be setting?

I would really appreciate the help!

Thanks,

Steve

Prev by Date: [Condor-users] Submit jobs to run on another machine
Next by Date: Re: [Condor-users] Condor 6.8.n: job running delays: RUN TIMES stay at Zero
Previous by thread: [Condor-users] Submit jobs to run on another machine
Next by thread: [Condor-users] Please help
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] 2 match but reject the job for unknown reasons