[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] timer killfamily running every 60 seconds



I discovered that the procd high cpu usage was related to taking a snapshot.  I set variable PROCD_MAX_SNAPSHOT_INTERVAL = 600 to help reduce cpu load on the local user.  Condor_master is still taking a snapshot every 60 seconds and maxing out its core.

 

Anyone know what to do to reduce/fix this problem.  Condor daemons never used to be cpu hogs.  I have also tried this with 7.6.3 with the same result.


Sam

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Sam Beckler
Sent: Tuesday, September 06, 2011 8:41 AM
To: condor-users@xxxxxxxxxxx
Subject: [Condor-users] timer killfamily running every 60 seconds

 

We’ve got a condor pool running Windows 7 Ent x64 and condor 7.6.1 and we are seeing an issue with condor_master and condor_procd consuming 1 processor core each for about 20 seconds every 60 seconds when condor is in the owner and unclaimed states.

 

When setting logging to D_all I found that condor is running “Calling Timer handler 8 (KillFamily::takesnapshot)” over and over again.  This log item matches up to the cpu activity.  Does anyone know how to fix this problem?

 

08/10/11 13:18:04 (fd:3) (pid:150184) Calling Timer handler 8 (KillFamily::takesnapshot)

08/10/11 13:18:04 (fd:3) (pid:150184) PRIV_CONDOR --> PRIV_CONDOR at c:\condor\execute\dir_2156\userdir\src\condor_utils\killfamily.cpp:279

08/10/11 13:18:09 (fd:3) (pid:150184) KillFamily: parent: 43464 family: 43464

08/10/11 13:18:09 (fd:3) (pid:150184) KillFamily: alive_cpu_user = 0, exited_cpu = 0, max_image = 9120k

08/10/11 13:18:09 (fd:3) (pid:150184) PRIV_CONDOR --> PRIV_CONDOR at c:\condor\execute\dir_2156\userdir\src\condor_utils\killfamily.cpp:480

08/10/11 13:18:09 (fd:3) (pid:150184) Return from Timer handler 8 (KillFamily::takesnapshot)

08/10/11 13:18:09 (fd:3) (pid:150184) PRIV_CONDOR --> PRIV_CONDOR at c:\condor\execute\dir_2156\userdir\src\condor_daemon_core.v6\daemon_core.cpp:3812

08/10/11 13:18:09 (fd:3) (pid:150184) DaemonCore Timeout() Complete, returning 3

08/10/11 13:18:09 (fd:3) (pid:150184) selector 03ADF950 resetting

08/10/11 13:18:09 (fd:3) (pid:150184) selector 03ADF950 adding fd 592 ()

08/10/11 13:18:09 (fd:3) (pid:150184) selector 03ADF950 adding fd 596 ()

08/10/11 13:18:09 (fd:3) (pid:150184) selector 03ADF950 adding fd 584 ()

08/10/11 13:18:09 (fd:3) (pid:150184) PERF: entering select

08/10/11 13:18:09 (fd:3) (pid:150184) Entering thread safe start [select] in selector.cpp:313 150668()

08/10/11 13:18:09 (fd:3) (pid:150184) Leaving thread safe start [select] in selector.cpp:313 150668()

08/10/11 13:18:12 (fd:3) (pid:150184) Entering thread safe stop [select] in selector.cpp:319 150668()

08/10/11 13:18:12 (fd:3) (pid:150184) Leaving thread safe stop [select] in selector.cpp:319 150668()

08/10/11 13:18:12 (fd:3) (pid:150184) PERF: leaving select

08/10/11 13:18:12 (fd:3) (pid:150184) State = TIMED_OUT

08/10/11 13:18:12 (fd:3) (pid:150184) max_fd = 596

08/10/11 13:18:12 (fd:3) (pid:150184) Selection FD's

08/10/11 13:18:12 (fd:3) (pid:150184)            Read {584 592 596 } = 3

08/10/11 13:18:12 (fd:3) (pid:150184)            Write {} = 0

08/10/11 13:18:12 (fd:3) (pid:150184)            Except {} = 0

08/10/11 13:18:12 (fd:3) (pid:150184) Timeout = 3.000000 seconds

08/10/11 13:18:12 (fd:3) (pid:150184) In DaemonCore Timeout()

08/10/11 13:18:13 (fd:3) (pid:150184)

 

Thanks,

 

Sam Beckler

Imaging and PC Management

Email: Beckle2@xxxxxxxxxxx

Phone: (864)656-5885

Cell: (864)650-1251