[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job requirements not satisfied even when Requirements = TRUE



Pump up debug on StartLog, ShadowLog, and Schedlog to D_FULLDEBUG,
it might tell you more.

STeve


On Wed, 31 Aug 2011, Steven Timm wrote:



There must be something in the machine classad about the requirements
of what jobs it will start.  Can you give a dump of condor_status -l
for one of the machines?

Steve Timm


On Wed, 31 Aug 2011, Mark Cafaro wrote:

Hi Garrett,

The job was successfully matched in the central manager's MatchLog (edited to remove ip and port):

08/31/11 20:05:16 Matched 27.0 user@...washington.edu <ip:port> preempting none <ip:port> slot1@...washington.edu

On the node's StartLog is where I see it being rejected:

08/31/11 20:05:16 slot1: match_info called
08/31/11 20:05:16 slot1: Received match <ip:port>#1314844613#28#...
08/31/11 20:05:16 slot1: State change: match notification protocol successful
08/31/11 20:05:16 slot1: Changing state: Unclaimed -> Matched
08/31/11 20:05:16 slot1: Job requirements not satisfied.
08/31/11 20:05:16 slot1: Request to claim resource refused.
08/31/11 20:05:16 slot1: State change: claiming protocol failed
08/31/11 20:05:16 slot1: Changing state: Matched -> Owner
08/31/11 20:05:16 slot1: State change: IS_OWNER is false
08/31/11 20:05:16 slot1: Changing state: Owner -> Unclaimed


condor_q -better-analyze returns:

027.000:  Request has not yet been considered by the matchmaker.

because it was successfully matched.

Unfortunately I have been through all of the logs and there is no indication of a problem anywhere except for the line "Job requirements not satisfied."



On Aug 31, 2011, at 7:45 PM, Koller, Garrett wrote:

Mr. Cafaro,

I'm confused. I thought the problem was that the job kept being rejected with the error "Job requirements not satisfied." If that is so, how could it be matched in the MatchLog? Was it just considered in the MatchLog or was it actually assigned to a specific slot on a specific computer? If the MatchLog says it found a proper match and actually assigned it to that computer, check out http://servo.cs.wlu.edu/dokuwiki/doku.php/condor/submit/troubleshoot for a possible reason and solution to this problem.

Also run 'condor_q -better-analyze' for a more in-depth look on why your job is being rejected. If the job is being rejected because of its requirements, this should tell you specifically which requirement is failing.

Either way, let me know if this helps and what you find out.

Best Regards,
~ Garrett Heath Koller
kollerg14@xxxxxxxxxxxx

Computer Science Major
Member of the  Fraternity
Washington and Lee University
Undergraduate Class of 2014
P.O. Box 970
Lexington, VA  24450
Cell: (918) 246-6374

On Aug 31, 2011, at 10:17 PM, Mark Cafaro wrote:

No luck there either. That should certainly evaluate to true.

I am just about out of ideas. The only thing I can gather from the logs is "Job requirements not satisfied." and condor_q -analyze says "Request has not yet been considered by the matchmaker." apparently because the match was made (I can see it in the MatchLog).

I am desperately hoping this is not a platform specific bug. We're on the often forgotten Macintosh.

On Aug 31, 2011, at 7:00 PM, Koller, Garrett wrote:

Mr. Cafaro,

Sure, that's easy. Just run 'condor_status -long | grep ^IsValidCheckpointPlatform' to see the expression that defines the value for "IsValidCheckpointPlatform". The expression depends a lot on the job being submitted. Because of this, note that in this expression "MY.*" refers to a variable in the machine's ClassAd (will be listed in 'condor_status -long') and "TARGET.*" refers to a variable in the job's ClassAd (will be listed in 'condor_q -long').

Best Regards,
~ Garrett K.
Washington and Lee University
condor.cs.wlu.edu

On Aug 31, 2011, at 9:51 PM, Mark Cafaro wrote:

Hi Garrett,

I have investigated this possibility and found it is likely not causing our problem. Requirements is appended, but I can overwrite the appended requirements with condor_qedit. In either case, I would not expect a match to be made if the manager wasn't able to match the requirements with the node. The manager matchs, but the node refuses.

I am wondering if this doesn't have to do with the fact that the node has:

Requirements = ( START ) && ( IsValidCheckpointPlatform )

I can't be sure that isValidCheckpointPlatform evaluates to true on my platform. Is there any way to determine
this?

On Aug 31, 2011, at 6:37 PM, Koller, Garrett wrote:

Mr. Cafaro,

The job's requirements expression is probably being appended to after it is submitted. Usually, the requirements in the submission file are logically and-ed (&&) with an expression that says what the job needs from its execution machine in terms of file transfer. When the job is in the queue, run something like 'condor_q -long <Job_Cluster_ID> | grep -i ^Requirements', where <Job_Cluster_ID> is the ID for the job you just submitted. There you will see the Requirement expression in its entirety. Most likely, you are asking Condor to do a file transfer mechanism that isn't supported by your environment. See Section 2.5.4, "Submitting Jobs Without a Shared File System: Condors File Transfer Mechanism," in the Condor manual (7.6.1 for me) for more information and note when it talks about "FileSystemDomain" and the like as this is one of the things appended to the job's Requirements expression depending on the type of file transfer desired.

Best Regards,
~ Garrett K.
Washington and Lee University
condor.cs.wlu.edu

On Aug 31, 2011, at 9:18 PM, Mark Cafaro wrote:

I am submitting sh_loop.cmd (from the condor examples) to my manager. It matches with a node and sends the job off. The node, however, refuses to accept the job claiming "Job requirements not satisfied.". The job is set with Requirements = TRUE. How can requirements not be satisfied and how can a match be made if the requirements were not satisfied?
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Group Leader.
Lead of FermiCloud project.