[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job requirements not satisfied even when Requirements = TRUE



Hi Garrett,

The job was successfully matched in the central manager's MatchLog (edited to remove ip and port):

08/31/11 20:05:16       Matched 27.0 user@...washington.edu <ip:port> preempting none <ip:port> slot1@...washington.edu

On the node's StartLog is where I see it being rejected:

08/31/11 20:05:16 slot1: match_info called
08/31/11 20:05:16 slot1: Received match <ip:port>#1314844613#28#...
08/31/11 20:05:16 slot1: State change: match notification protocol successful
08/31/11 20:05:16 slot1: Changing state: Unclaimed -> Matched
08/31/11 20:05:16 slot1: Job requirements not satisfied.
08/31/11 20:05:16 slot1: Request to claim resource refused.
08/31/11 20:05:16 slot1: State change: claiming protocol failed
08/31/11 20:05:16 slot1: Changing state: Matched -> Owner
08/31/11 20:05:16 slot1: State change: IS_OWNER is false
08/31/11 20:05:16 slot1: Changing state: Owner -> Unclaimed


condor_q -better-analyze returns:

027.000:  Request has not yet been considered by the matchmaker.

because it was successfully matched.

Unfortunately I have been through all of the logs and there is no indication of a problem anywhere except for the line "Job requirements not satisfied."



On Aug 31, 2011, at 7:45 PM, Koller, Garrett wrote:

> Mr. Cafaro,
> 
> I'm confused.  I thought the problem was that the job kept being rejected with the error "Job requirements not satisfied."  If that is so, how could it be matched in the MatchLog?  Was it just considered in the MatchLog or was it actually assigned to a specific slot on a specific computer?  If the MatchLog says it found a proper match and actually assigned it to that computer, check out http://servo.cs.wlu.edu/dokuwiki/doku.php/condor/submit/troubleshoot for a possible reason and solution to this problem.
> 
> Also run 'condor_q -better-analyze' for a more in-depth look on why your job is being rejected.  If the job is being rejected because of its requirements, this should tell you specifically which requirement is failing.
> 
> Either way, let me know if this helps and what you find out.
> 
> Best Regards,
> ~ Garrett Heath Koller
> kollerg14@xxxxxxxxxxxx
> 
> Computer Science Major
> Member of the ΣΦΕ Fraternity
> Washington and Lee University
> Undergraduate Class of 2014
> P.O. Box 970
> Lexington, VA  24450
> Cell: (918) 246-6374
> 
> On Aug 31, 2011, at 10:17 PM, Mark Cafaro wrote:
> 
>> No luck there either. That should certainly evaluate to true.
>> 
>> I am just about out of ideas. The only thing I can gather from the logs is "Job requirements not satisfied." and condor_q -analyze says "Request has not yet been considered by the matchmaker." apparently because the match was made (I can see it in the MatchLog).
>> 
>> I am desperately hoping this is not a platform specific bug. We're on the often forgotten Macintosh.
>> 
>> On Aug 31, 2011, at 7:00 PM, Koller, Garrett wrote:
>> 
>>> Mr. Cafaro,
>>> 
>>> Sure, that's easy.  Just run 'condor_status -long | grep ^IsValidCheckpointPlatform' to see the expression that defines the value for "IsValidCheckpointPlatform".  The expression depends a lot on the job being submitted.  Because of this, note that in this expression "MY.*" refers to a variable in the machine's ClassAd (will be listed in 'condor_status -long') and "TARGET.*" refers to a variable in the job's ClassAd (will be listed in 'condor_q -long').
>>> 
>>> Best Regards,
>>> ~ Garrett K.
>>> Washington and Lee University
>>> condor.cs.wlu.edu
>>> 
>>> On Aug 31, 2011, at 9:51 PM, Mark Cafaro wrote:
>>> 
>>>> Hi Garrett,
>>>> 
>>>> I have investigated this possibility and found it is likely not causing our problem. Requirements is appended, 
>>>> but I can overwrite the appended requirements with condor_qedit. In either case, I would not expect a match 
>>>> to be made if the manager wasn't able to match the requirements with the node. The manager matchs, but 
>>>> the node refuses.
>>>> 
>>>> I am wondering if this doesn't have to do with the fact that the node has:
>>>> 
>>>> Requirements = ( START ) && ( IsValidCheckpointPlatform )
>>>> 
>>>> I can't be sure that isValidCheckpointPlatform evaluates to true on my platform. Is there any way to determine
>>>> this?
>>>> 
>>>> On Aug 31, 2011, at 6:37 PM, Koller, Garrett wrote:
>>>> 
>>>>> Mr. Cafaro,
>>>>> 
>>>>> The job's requirements expression is probably being appended to after it is submitted.  Usually, the requirements in the submission file are logically and-ed (&&) with an expression that says what the job needs from its execution machine in terms of file transfer.  When the job is in the queue, run something like 'condor_q -long <Job_Cluster_ID> | grep -i ^Requirements', where <Job_Cluster_ID> is the ID for the job you just submitted.  There you will see the Requirement expression in its entirety.  Most likely, you are asking Condor to do a file transfer mechanism that isn't supported by your environment.  See Section 2.5.4, "Submitting Jobs Without a Shared File System: Condor’s File Transfer Mechanism," in the Condor manual (7.6.1 for me) for more information and note when it talks about "FileSystemDomain" and the like as this is one of the things appended to the job's Requirements expression depending on the type of file transfer desired.
>>>>> 
>>>>> Best Regards,
>>>>> ~ Garrett K.
>>>>> Washington and Lee University
>>>>> condor.cs.wlu.edu
>>>>> 
>>>>> On Aug 31, 2011, at 9:18 PM, Mark Cafaro wrote:
>>>>> 
>>>>>> I am submitting sh_loop.cmd (from the condor examples) to my manager. It matches with a node and sends the job off. The node, however, refuses to accept the job claiming "Job requirements not satisfied.". 
>>>>>> 
>>>>>> The job is set with Requirements = TRUE. How can requirements not be satisfied and how can a match be made if the requirements were not satisfied?
>>>>>> _______________________________________________
>>>>>> Condor-users mailing list
>>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>>>> subject: Unsubscribe
>>>>>> You can also unsubscribe by visiting
>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>>> 
>>>>>> The archives can be found at:
>>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>>> 
>>>>> _______________________________________________
>>>>> Condor-users mailing list
>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>>> subject: Unsubscribe
>>>>> You can also unsubscribe by visiting
>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>> 
>>>>> The archives can be found at:
>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>> 
>>>> _______________________________________________
>>>> Condor-users mailing list
>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>> subject: Unsubscribe
>>>> You can also unsubscribe by visiting
>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>> 
>>>> The archives can be found at:
>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>> 
>>> _______________________________________________
>>> Condor-users mailing list
>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>> 
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/condor-users/
>> 
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/