[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Memory requests increasing



On Feb 21, 2013, at 9:47 AM, "Rochford, Steve" <s.rochford@xxxxxxxxxxxxxx> wrote:

> We're running Condor 7.8.2 and seeing that some jobs never complete. The log file below is from a job using Abaqus. I submit the job via Condor and it gets picked up by a machine. Provided that no-one reboots the machine then the file gets processed in about 3 hours on a machine with 4GB of RAM. There's a a lot of swapping to disk but it all works.
> 
> I'm not sure that I understand what the log below is telling me; the final lines are easy - the user aborted because nothing had happened but is there anything significant about the increasing "ResidentSetSize"? 

The ResidentSetSize is just reporting the maximum RAM used by the job so far. A ResidentSetSize of 3.5GB agrees with your report that the job causes swapping on a machine with 4GB of RAM, but can run successfully (depending on what else is is using memory on the machine).
When the Image size events cease, it means the job's RAM usage has plateaued or declined. The job is still running. If it was running for longer than expected, maybe additional load on the machine slowed down execution (due to contention for CPU or RAM).

Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project