Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Memory requests increasing
- Date: Thu, 28 Feb 2013 10:13:28 -0500 (EST)
- From: Tim St Clair <tstclair@xxxxxxxxxx>
- Subject: Re: [HTCondor-users] Memory requests increasing
Override request_memory in your submission
b/c ( TARGET.Memory >= RequestMemory ) is still in your requirements.
----- Original Message -----
> From: "Steve Rochford" <s.rochford@xxxxxxxxxxxxxx>
> To: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
> Cc: "condor-users@xxxxxxxxxxx" <condor-users@xxxxxxxxxxx>
> Sent: Thursday, February 28, 2013 6:02:20 AM
> Subject: Re: [HTCondor-users] Memory requests increasing
>
> Thanks for this but I'm still not quite sure what's going on so maybe
> an example helps.
>
> I've got a job which was submitted yesterday and the requirements
> line in the submit file was:
>
> Requirements = starccm == "yes" && Memory >=900
>
> It started running but is now shown as idle; running condor_q gives
> me the info below. What I don't understand is why it's now asking
> for 2686MB of RAM (at least, I do understand; the .log file shows
> that the job was using 2686MB when it was evicted yesterday. The
> eviction was probably because another user started to use the
> machine but why does Condor assume it's got to have that much RAM in
> order to re-run?)
>
> Steve
>
> C:\scripts>condor_q 1534 -analysebetter
>
>
> -- Submitter: HTCONDOR.cc.ic.ac.uk : <155.198.30.249:51232> :
> HTCONDOR.cc.ic.ac.uk
> ---
> 1534.000: Run analysis summary. Of 2815 machines,
> 2815 are rejected by your job's requirements
> 0 reject your job because of their own requirements
> 0 match but are serving users with a better priority in the
> pool
> 0 match but reject the job for unknown reasons
> 0 match but will not currently preempt their existing job
> 0 match but are currently offline
> 0 are available to run your job
> Last successful match: Wed Feb 27 17:40:13 2013
> Last failed match: Thu Feb 28 11:55:58 2013
>
> Reason for last match failure: no match found
>
> WARNING: Be advised:
> No resources matched request's constraints
>
> The Requirements expression for your job is:
>
> ( target.starccm == "yes" && target.Memory >= 900 ) &&
> ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "WINDOWS" ) &&
> ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory )
> &&
> ( TARGET.HasFileTransfer )
>
> Condition Machines Matched Suggestion
> --------- ---------------- ----------
> 1 ( TARGET.Memory >= 2686 ) 0 MODIFY TO
> 1018
> 2 target.starccm == "yes" 1876
> 3 target.Memory >= 900 2532
> 4 ( TARGET.Arch == "X86_64" ) 2815
> 5 ( TARGET.OpSys == "WINDOWS" ) 2815
> 6 ( TARGET.Disk >= 500000 ) 2815
> 7 ( TARGET.HasFileTransfer ) 2815
>
>
> -----Original Message-----
> From: htcondor-users-bounces@xxxxxxxxxxx
> [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Jaime Frey
> Sent: 26 February 2013 17:22
> To: HTCondor-Users Mail List
> Cc: 'condor-users@xxxxxxxxxxx'
> Subject: Re: [HTCondor-users] Memory requests increasing
>
> On Feb 21, 2013, at 9:47 AM, "Rochford, Steve"
> <s.rochford@xxxxxxxxxxxxxx> wrote:
>
> > We're running Condor 7.8.2 and seeing that some jobs never
> > complete. The log file below is from a job using Abaqus. I submit
> > the job via Condor and it gets picked up by a machine. Provided
> > that no-one reboots the machine then the file gets processed in
> > about 3 hours on a machine with 4GB of RAM. There's a a lot of
> > swapping to disk but it all works.
> >
> > I'm not sure that I understand what the log below is telling me;
> > the final lines are easy - the user aborted because nothing had
> > happened but is there anything significant about the increasing
> > "ResidentSetSize"?
>
> The ResidentSetSize is just reporting the maximum RAM used by the job
> so far. A ResidentSetSize of 3.5GB agrees with your report that the
> job causes swapping on a machine with 4GB of RAM, but can run
> successfully (depending on what else is is using memory on the
> machine).
> When the Image size events cease, it means the job's RAM usage has
> plateaued or declined. The job is still running. If it was running
> for longer than expected, maybe additional load on the machine
> slowed down execution (due to contention for CPU or RAM).
>
> Thanks and regards,
> Jaime Frey
> UW-Madison HTCondor Project
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>