[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Memory requests increasing



Override request_memory in your submission 

b/c ( TARGET.Memory >= RequestMemory ) is still in your requirements.  



----- Original Message -----
> From: "Steve Rochford" <s.rochford@xxxxxxxxxxxxxx>
> To: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
> Cc: "condor-users@xxxxxxxxxxx" <condor-users@xxxxxxxxxxx>
> Sent: Thursday, February 28, 2013 6:02:20 AM
> Subject: Re: [HTCondor-users] Memory requests increasing
> 
> Thanks for this but I'm still not quite sure what's going on so maybe
> an example helps.
> 
> I've got a job which was submitted yesterday and the requirements
> line in the submit file was:
> 
> Requirements = starccm == "yes" && Memory >=900
> 
> It started running but is now shown as idle; running condor_q gives
> me the info below. What I don't understand is why it's now asking
> for 2686MB of RAM (at least, I do understand; the .log file shows
> that the job was using 2686MB when it was evicted yesterday. The
> eviction was probably because another user started to use the
> machine but why does Condor assume it's got to have that much RAM in
> order to re-run?)
> 
> Steve
> 
> C:\scripts>condor_q  1534 -analysebetter
> 
> 
> -- Submitter: HTCONDOR.cc.ic.ac.uk : <155.198.30.249:51232> :
> HTCONDOR.cc.ic.ac.uk
> ---
> 1534.000:  Run analysis summary.  Of 2815 machines,
>    2815 are rejected by your job's requirements
>       0 reject your job because of their own requirements
>       0 match but are serving users with a better priority in the
>       pool
>       0 match but reject the job for unknown reasons
>       0 match but will not currently preempt their existing job
>       0 match but are currently offline
>       0 are available to run your job
>         Last successful match: Wed Feb 27 17:40:13 2013
>         Last failed match: Thu Feb 28 11:55:58 2013
> 
>         Reason for last match failure: no match found
> 
> WARNING:  Be advised:
>    No resources matched request's constraints
> 
> The Requirements expression for your job is:
> 
> ( target.starccm == "yes" && target.Memory >= 900 ) &&
> ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "WINDOWS" ) &&
> ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory )
> &&
> ( TARGET.HasFileTransfer )
> 
>     Condition                         Machines Matched    Suggestion
>     ---------                         ----------------    ----------
> 1   ( TARGET.Memory >= 2686 )         0                   MODIFY TO
> 1018
> 2   target.starccm == "yes"           1876
> 3   target.Memory >= 900              2532
> 4   ( TARGET.Arch == "X86_64" )       2815
> 5   ( TARGET.OpSys == "WINDOWS" )     2815
> 6   ( TARGET.Disk >= 500000 )         2815
> 7   ( TARGET.HasFileTransfer )        2815
> 
> 
> -----Original Message-----
> From: htcondor-users-bounces@xxxxxxxxxxx
> [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Jaime Frey
> Sent: 26 February 2013 17:22
> To: HTCondor-Users Mail List
> Cc: 'condor-users@xxxxxxxxxxx'
> Subject: Re: [HTCondor-users] Memory requests increasing
> 
> On Feb 21, 2013, at 9:47 AM, "Rochford, Steve"
> <s.rochford@xxxxxxxxxxxxxx> wrote:
> 
> > We're running Condor 7.8.2 and seeing that some jobs never
> > complete. The log file below is from a job using Abaqus. I submit
> > the job via Condor and it gets picked up by a machine. Provided
> > that no-one reboots the machine then the file gets processed in
> > about 3 hours on a machine with 4GB of RAM. There's a a lot of
> > swapping to disk but it all works.
> > 
> > I'm not sure that I understand what the log below is telling me;
> > the final lines are easy - the user aborted because nothing had
> > happened but is there anything significant about the increasing
> > "ResidentSetSize"?
> 
> The ResidentSetSize is just reporting the maximum RAM used by the job
> so far. A ResidentSetSize of 3.5GB agrees with your report that the
> job causes swapping on a machine with 4GB of RAM, but can run
> successfully (depending on what else is is using memory on the
> machine).
> When the Image size events cease, it means the job's RAM usage has
> plateaued or declined. The job is still running. If it was running
> for longer than expected, maybe additional load on the machine
> slowed down execution (due to contention for CPU or RAM).
> 
> Thanks and regards,
> Jaime Frey
> UW-Madison HTCondor Project
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>