[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] New Cluster - match but reject the job for unknown reasons



Dirk Colbry wrote:
Shahaan,

Thanks for your help.  I did some digging and as it turns out it was a
firewall problem.  The CONDOR_HOST was wide open to receive
connections but could not make any external requests.  It was very
frustrating bug to find.  I wonder if there is any way that condor
could return a better error message for this case?


FYI, in Condor version 7.5.6 and above condor_q -analyze indeed returns more descriptive messages in the same situations when Condor version 7.4.x would have just claimed "for unknown reasons". Development ticket details are at
  https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1670

regards,
Todd


Thanks again,

- Dirk


On Wed, May 25, 2011 at 8:40 PM, Shahaan Ayyub <shahaan@xxxxxxxxx> wrote:
Hi Dirk,
  Have a careful look yourself; and also tail the Sched and Negotiator logs
in your next mail.
regards,
Shahaan


On Thu, May 26, 2011 at 1:32 AM, Dirk Colbry <colbrydi@xxxxxxx> wrote:
Hey Everyone,

I am setting up a new condor cluster. The CONDOR_HOST is running in
RHEL6.0 with condor 7.4.4 using a basic yum install.  All of my worker
nodes are in WindowsXP also with condor 7.4.4.  When I submit a job to
the windows machines the are always stuck in Idle and I seem to be
reproducing the problem described at the following link:

https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1645

I have seen similar problems posted to this email list in the past but
I was unable to determine the proper direction for a solution. My
output of condor_q with the -better-analyze flag is as follows:

==================================
condor_q -better-analyze 36.0
-- Submitter: accumulator.hpcc.msu.edu : <10.1.1.24:49968> :
accumulator.hpcc.msu.edu
---
036.000:  Run analysis summary.  Of 6 machines,
     1 are rejected by your job's requirements
     2 reject your job because of their own requirements
     0 match but are serving users with a better priority in the pool
     3 match but reject the job for unknown reasons
     0 match but will not currently preempt their existing job
     0 match but are currently offline
     0 are available to run your job
       Last successful match: Wed May 25 09:20:30 2011

The Requirements expression for your job is:

( ( target.OpSys == "WINNT51" ) && ( target.Arch == "INTEL" ) ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize )
&&
( ( RequestMemory * 1024 ) >= ImageSize ) && ( target.HasFileTransfer )

   Condition                         Machines Matched    Suggestion
   ---------                         ----------------    ----------
1   ( ( 1024 * ceiling(ifThenElse(JobVMMemory isnt
undefined,JobVMMemory,0.0)) ) >= 0 )
                                     0                   REMOVE
2   ( target.OpSys == "WINNT51" )     5
3   ( target.Arch == "INTEL" )        5
4   ( target.Disk >= 75 )             6
5   ( ( 1024 * target.Memory ) >= 0 ) 6
6   ( target.HasFileTransfer )        6
==================================

According to the above link the REMOVE suggestion from the
better-analyze flag is a red herring and the problem is more likely in
the firewalls. I put my CONDOR_HOST and one of my windowsXP boxes on a
private network, turned off all of the firewalls and still get the
same error so it looks like I am not dealing with a firewall problem.

Has anyone else seen this problem?  Do you have any suggestions for
things I could try?  Is there any more information I could be looking
for to make this problem easier to diagnose?

Thanks,

- Dirk
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


--
Todd Tannenbaum                       University of Wisconsin-Madison
Center for High Throughput Computing  Department of Computer Sciences
tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                 Madison, WI 53706-1685