[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs are rejected for unknown reasons although requirements are fulfilled



Dear all,

Thank you very much for your replies.

For Tim: I didn't know those settings cause problems. Actually, I was not aware before you pointed out. Thank you very much again. I will check those computers and those settings. I hope I can solve the problem. Otherwise, I need to create another condor pool for computers in that building :(

For Shahaan: The jobs I submitted to 25 computers and recently caused these problems never run jobs. Jobs are always kept in idle. Schedlog says me :

IO: Failed to read packet header
05/22/12 09:45:20 (pid:5936) Response problem from startd when requesting claim slot1@PCLAB-PC <10.1.115.195:49198> for jlab 4215.0.
05/22/12 09:45:20 (pid:5936) Failed to send REQUEST_CLAIM to startd slot1@PCLAB-PC <10.1.115.195:49198> for jlab: CEDAR:6004:failed reading from socket
05/22/12 09:45:20 (pid:5936) Match record (slot1@PCLAB-PC <10.1.115.195:49198> for jlab, 4215.0) deleted
05/22/12 09:45:20 (pid:5936) Finished negotiating for jlab in local pool: 20 matched, 1 rejected
05/22/12 09:45:20 (pid:5936) condor_read() failed: recv() returned -1, errno = 10053 , reading 5 bytes from startd slot3@PCLAB-PC <10.1.115.195:49198> for jlab.



2012/5/22 Shahaan Ayyub <shahaan@xxxxxxxxx>
Hi Canan,
   What does the SchedLog says? I have observed this on Windows when the job failed to run on the execute node; it's state changing from Running to Idle immediately. You should be able to tell this by looking at the schedlog. Please try to run it manually on one of the nodes that matched the criteria, and then resubmit again.

regards,

Shahaan

On Mon, May 21, 2012 at 5:38 PM, Canan Has <cananhas@xxxxxxxxx> wrote:
Dear all,

I have posted this message last week. But I couldn't get any reply. Solving this problem is important and urgent for me, so I am posting again.
I have been using condor for a while.
I added new computers to our pool whose have "WINDOWS" as OpSys and INTEL as Arch and 503 as Mem.
I designed requirements in my submit files as: 

requirements = OpSys == "WINDOWS" && Arch == "INTEL"

However, what I see by calling condor_q -better -analyze is:

173.000:  Run analysis summary.  Of 84 machines,
     33 are rejected by your job's requirements 
     41 reject your job because of their own requirements 
      0 match but are serving users with a better priority in the pool 
     10 match but reject the job for unknown reasons 
      0 match but will not currently preempt their existing job 
      0 match but are currently offline 
      0 are available to run your job
No successful match recorded.
Last failed match: Mon May 21 09:59:03 2012
Reason for last match failure: no match found

The Requirements _expression_ for your job is:

( target.Memory >= 32 && target.OpSys == "WINDOWS" && target.Arch == "INTEL" ) &&
( TARGET.Disk >= DiskUsage ) && ( ( RequestMemory * 1024 ) >= ImageSize ) &&
( TARGET.HasFileTransfer )

    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   target.OpSys == "WINDOWS"         53                   
2   target.Arch == "INTEL"            61                   
3   target.Memory >= 32               84                   
4   ( TARGET.Disk >= 12500 )          84                   
5   ( ( 1024 * ceiling(ifThenElse(JobVMMemory isnt undefined,JobVMMemory,2.929687500000000E+000)) ) >= 3000 )
                                      84                   
6   ( TARGET.HasFileTransfer )        84                   

The following attributes are missing from the job ClassAd:

CheckpointPlatform
---
Can anyone tell me why I get this error? Is it related to CheckPoint platform which is used in Unix systems as far as I know?

Thanks in advance,
Canan Has

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/