[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Fwd: Jobs in running state but shadow process not started



Hello Experts,Â

Anyone seen this issue or want to share some inputs for troubleshooting?

Thanks & Regards,
Vikrant Aggarwal


---------- Forwarded message ---------
From: Vikrant Aggarwal <ervikrant06@xxxxxxxxx>
Date: Thu, Feb 4, 2021 at 5:37 PM
Subject: Fwd: [HTCondor-users] Jobs in running state but shadow process not started
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>



Adding more information:Â

My understanding is that the job can't be in running unless shadow starts.Â

We are judging the number of shadow processes by looking forÂadd_shadow_birthdate.



Thanks & Regards,
Vikrant Aggarwal


---------- Forwarded message ---------
From: Vikrant Aggarwal <ervikrant06@xxxxxxxxx>
Date: Thu, Feb 4, 2021 at 5:35 PM
Subject: [HTCondor-users] Jobs in running state but shadow process not started
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>


Hello Experts,

Disclaimer: I haven't seen this issue personally but I have a user complaining about this behavior.Â

With a large batch of jobs ex: 20k, user see the jobs are running in RUNNING state but condor sched logs ( /var/log/condor/SchedLog )Â doesn't show shadow processes started yet. User is seeing 10-15 shadow processes getting created per second.Â

We have kept the condor_q log file on /dev/shm but no major gain.Â

Anyone awareÂof this issue?

Thanks & Regards,
Vikrant Aggarwal