[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Schedd possibly spinning on a job



Did you using condor_sos before the condor_rm command?

 

D_ALL will definitely make the problem worse by the way.  It’s insanely chatty.

 

-tj

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Larne Pekowsky via HTCondor-users
Sent: Thursday, May 16, 2019 12:36 PM
To: 'htcondor-users@xxxxxxxxxxx' <htcondor-users@xxxxxxxxxxx>
Cc: Larne Pekowsky <lppekows@xxxxxxx>
Subject: [HTCondor-users] Schedd possibly spinning on a job

 

Hi all,

 

Our schedd has been pegged at 100% cpu for several hours and immediately returns to that state on restart.  At D_FULLDEBUG the log floods with the message

 

   05/16/19 12:50:58 satisfyJobs: finding resources for 6092282.0

 

so it almost looks like the schedd is stuck in a loop on this job.  I’d like to remove it to see if that fixes the problem, but of course with the schedd running at 100% condor_rm can’t get through.  Any suggestions?  Also, is there any way to get more detailed information on what’s happening?  D_ALL didn’t seem to have anything useful.

 

Thanks,

 

                                                                                                - Larne