[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Why does machine reject job for unknown reasons
- Date: Tue, 15 May 2007 15:16:55 +0100
- From: "Kewley, J \(John\)" <j.kewley@xxxxxxxx>
- Subject: Re: [Condor-users] Why does machine reject job for unknown reasons
would suggest looking at the log files on the submission and central manager
(Condor gurus will
more specific with exactly where to look).
(automatic these days) first response is to ensure that there are no firewalls
submission node and any of the prospective execute nodes. And if there is, are
and ephemeral ports open for both UDP and TCP.
scenario where jobs match to a machine and then never get there
be caused by NATs causing similar connection problems.
of above would cause "evidence" to appear in the log files.
Another problem might be where the job cannot start at the machine
because of file transfer,
filestore issues (although I can't recall whether the symptoms would be the
log files would give useful hints as to what was happening.
have other jobs running OK in the pool? If so, what is different about this
not, then I'd suggest running a more trivial job (like /bin/hostname or
this group is for users so we don't always have time to respond to queries.
While often it is
condor team themselves, quite often it is fellow users.
sorry to bother you again
with my question, but this problem still persists. I have recieved so far no
idea how to find out why condor-jobs are rejected ...
On 5/14/07, Alexander
for this suggestion, but the output really does not help me further (see
below). It looks like that 150 machine are good to run the jobs on, but
still they are rejected for unknown reasons! I need them to start
immediately because of a timely limited online-demonstration for the work I
Any other suggestions?
> condor_q -better-analyze
1082109.000: Run analysis summary. Of 152
2 are rejected by
your job's requirements
0 reject your job
because of their own requirements
but are serving users with a better priority in the
150 match but reject the job for unknown
0 match but will not
currently preempt their existing job
are available to run your job
The Requirements _expression_ for
your job is:
( target.Arch == "X86_64" ) && ( target.OpSys ==
"LINUX" ) &&
( ( target.CkptArch == target.Arch ) || (
target.CkptArch is undefined ) ) &&
( ( target.CkptOpSys ==
target.OpSys ) || ( target.CkptOpSys is undefined ) ) &&
target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >=
Machines Matched Suggestion
1 ( target.Disk
>= 10000 )
2 ( target.Arch == "X86_64"
3 ( target.OpSys ==
"LINUX" ) 152
4 ( (
target.CkptArch == target.Arch ) || ( target.CkptArch is undefined )
5 ( ( target.CkptOpSys == target.OpSys ) || (
target.CkptOpSys is undefined )
6 ( ( 1024 * target.Memory ) >= 10000