[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Fwd: Re: Docker universe ImageSize/MemoryUsage



So this error seems intermittent (maybe multiple jobs on a single machine? boto3 multi-part upload?). I've tried executing this job as a bare-bone docker run and the MEM USAGE "docker stats" seems reasonably within the LIMIT and the job finished successfully.

Also,
1. From the job.log, the HUGE MemoryUsage is reported immediately after the job starts executing on the worker machine. Does that have anything to do with how condor_starter wraps docker?
2. Shouldn't condor ideally have complained/HELD with "Docker job has gone over the memory limit" log?


From: Greg Thain <gthain@xxxxxxxxxxx>
Sent: Monday, August 7, 2017 6:30 AM
To: Sitharaman, Harish Mahadevan
Subject: Re: [HTCondor-users] Docker universe ImageSize/MemoryUsage
 
On 08/06/2017 01:36 PM, Sitharaman, Harish Mahadevan wrote:
Hello,
 
We’re using Condor’s Docker universe (HTCondor 8.6.1 , Docker 1.27) on amazon ec2 instances wherein jobs are being terminated intermittently (the same job executes successfully sometimes) when the reported MemoryUsage exceeds un-reasonably higher than the specified RequestMemory. This was NOT the case earlier when we ran the same jobs on HTCondor’s Standard Universe with own wrapper to execute Docker run. Any suggestion/help would be appreciated:
 

If you have a vanilla job that just executes "docker run ...", then the job proper is a child process of the docker daemon, not of the condor_starter (unless you also started the docker daemon under the starter yourself), and so the memory that condor knows about and reports in this case never includes the memory that the job proper uses.

If you run the job by hand under docker, can you query the memory used with the docker stats command?

-greg