[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] slow creation of condor_shadow processes



Hi Greg,

Thanks for the input..Â

JOB_START_DELAY = 0 is set to the same value on bad and good submit nodes.

JOB_START_COUNT is not defined.Â

Submit file is very simple.Â

$ egrep -v "^(#|$)" sleep.sub
executable = /bin/sleep
arguments = "1800"
should_transfer_files = yes
when_to_transfer_output = ON_EXIT_OR_EVICT
log = output_file
batch_name = 'test'
output = test_out_print.txt
error = test_err_print.txt
queue 3000


Thanks & Regards,
Vikrant Aggarwal


On Wed, Feb 24, 2021 at 11:34 PM Greg Thain <gthain@xxxxxxxxxxx> wrote:


Kind of a long shot, but does these schedd have the knobs

JOB_START_DELAY

or

JOB_START_COUNT

set?


Or maybe, do the jobs define in their submit file

next_job_start_delay

?

-greg

On 2/24/21 11:08 AM, ervikrant06@xxxxxxxxx wrote:
Hello Experts,

ÂAny thoughts on following query.Â

On Tue, 23 Feb, 2021, 17:19 Vikrant Aggarwal, <ervikrant06@xxxxxxxxx> wrote:
Hello Experts,

On bad condor submit boxes seeing a very slow process creation rate for condor shadow processes 10-15/s while on good condor submit boxes rate is 100-150/s.

Submitting a large batch of jobs more than 5k, jobs are in R state but I see no slot allocated to jobs. Using the following command to see the jobs in R state without any slot allocated to them. As the shadow process gets created, slot starts showing for the running job. Approx 20m delay noticed between slotÂallocation to first and last job. Definitely slow shadow process creation causing this issue

while true ; do condor_q -run -nobatch | grep -v 'slot' | wc -l ;sleep 10 ; doneÂ

Command used to calculate number of shadow processes created per sec for a batch:

grep 'Starting add_shadow_birthdate(616' /var/log/condor/SchedLog | awk '{print $2}' | sort | uniq -c | less

Troubleshooting Done:

- condor conf is managed through the same conf mgmt tool.Â
- condor version in use is 8.5.8 (dev) on both boxes
- Tried to apply htcondor kernel turning script on bad submit box but no luck.Â

Seeing the same issue with the 8.8.5 (stable) condor submit box also.Â

Any input in helping to pinpoint the issue is highly appreciated.Â

Thanks & Regards,
Vikrant Aggarwal

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/