[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Memory requests increasing



We're running Condor 7.8.2 and seeing that some jobs never complete. The log file below is from a job using Abaqus. I submit the job via Condor and it gets picked up by a machine. Provided that no-one reboots the machine then the file gets processed in about 3 hours on a machine with 4GB of RAM. There's a a lot of swapping to disk but it all works.

I'm not sure that I understand what the log below is telling me; the final lines are easy - the user aborted because nothing had happened but is there anything significant about the increasing "ResidentSetSize"? 

Steve

000 (1299.000.000) 02/17 13:56:02 Job submitted from host: <155.198.30.249:58189>
...
001 (1299.000.000) 02/17 13:57:14 Job executing on host: <155.198.72.65:50149>
...
006 (1299.000.000) 02/17 13:57:23 Image size of job updated: 1
	1  -  MemoryUsage of job (MB)
	128  -  ResidentSetSize of job (KB)
...
006 (1299.000.000) 02/17 14:02:25 Image size of job updated: 2582400
	2522  -  MemoryUsage of job (MB)
	2582400  -  ResidentSetSize of job (KB)
...
006 (1299.000.000) 02/17 14:07:27 Image size of job updated: 3298708
	3222  -  MemoryUsage of job (MB)
	3298708  -  ResidentSetSize of job (KB)
...
006 (1299.000.000) 02/17 14:12:27 Image size of job updated: 3446956
	3367  -  MemoryUsage of job (MB)
	3446956  -  ResidentSetSize of job (KB)
...
006 (1299.000.000) 02/17 14:17:26 Image size of job updated: 3446972
	3367  -  MemoryUsage of job (MB)
	3446972  -  ResidentSetSize of job (KB)
...
006 (1299.000.000) 02/17 14:37:32 Image size of job updated: 3446980
	3367  -  MemoryUsage of job (MB)
	3446980  -  ResidentSetSize of job (KB)
...
009 (1299.000.000) 02/20 21:55:54 Job was aborted by the user.
	via condor_rm (by user jw508)