[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor-CE memory issue



Hello again,

I've been investigating a little further. It seems that the noticeable increase of memory consumption for the condor_schedd process was related with an unusual number of jobs on Hold state due to an error in a JobRouter entry. After solving the problem with JobRouter and restarting the service, the heap size only increases 3 MB in one day. So, it is increasing but not as fast as the other day. I will continue to check this.

Cheers,

Carles

On 11/22/2016 11:50 AM, Carles Acosta wrote:
Hi Iain,

I'm running Scientific Linux 6.7.

I'm going to look carefully into the memory consumption for the condor_schedd process and see how it evolves. For the last hour, I see how the heap size is the only clearly increasing, from 480 MB to 496 MB.

Cheers,

Carles

On 11/22/2016 10:00 AM, Iain Steers wrote:
Hi Carlos,

We're running the same versions of htcondor and htcondor-ce as you and don't see an issue.

What OS are you running on?

The only memory-usage issue I'm aware of is when the ce deals with re-delegations, after a large number of submissions in a small period of time.

This is due to the submission pattern of a particular vo and has been fixed in 8.5.8. The memory returns back to normal after the re-delegations finish

However, it doesn't sound like your issue. Have you taken a look at smaps to see where the memory usage is coming from?

For reference all our CEs are 16GB RAM, which is comfortable. I've just checked and we don't see the rss usage on the schedd process of our busiest schedd go above 600MB-1GB under normal circumstances

Cheers, Iain
________________________________________
From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Carles Acosta [cacosta@xxxxxx]
Sent: 22 November 2016 09:35
To: HTCondor-Users Mail List
Subject: [HTCondor-users] HTCondor-CE memory issue

Dear all,

Our HTCondor-CE is eating up all the memory of the system, constantly increasing. When the memory used arrives to 65-70% (it's a 32GB RAM machine), we start to see the next errors in the SchedLog:

Create_Thread: fork() failed: Cannot allocate memory (12)
ForkWorker::Fork: Fork failed

And all the submitted jobs remain in Hold state (Hold reason: Spooling input data files).

The issue is solved reloading the condor-ce services but, then, the memory starts to increase again, constantly and slowly.

I would like to know if you are facing similar problems with your HTcondor-CEs and how you solve them.

Nowadays, we are running HTCondor 8.5.6 and HTCondor-CE 2.0.7-1.

Thank you in advance.

Best regards,

Carles

--
Carles Acosta i Silva
PIC (Port d'Informació Científica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 22
Fax: +34 93 581 41 10
http://www.pic.es
Avís - Aviso - Legal Notice: http://www.ifae.es/legal.html


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/




--
Carles Acosta i Silva
PIC (Port d'Informació Científica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 22
Fax: +34 93 581 41 10
http://www.pic.es
Avís - Aviso - Legal Notice: http://www.ifae.es/legal.html