[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Possible delays for starting a shadow?




What version of HTCondor are you running? We made a change in HTCondor version 8.9.11 to quickly close any open file descriptors. Earlier versions of HTCondor would attempt to close all file descriptors and long delays were noted in some cases.

...Tim

--
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of ervikrant06@xxxxxxxxx <ervikrant06@xxxxxxxxx>
Sent: Thursday, December 30, 2021 5:46 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Possible delays for starting a shadow?
 
Hello, 

If I am understanding your query correctly, It could possibly be related to the following settings (set to higher value than default).

MAX_FILE_DESCRIPTORS = 102400
SCHEDD_MAX_FILE_DESCRIPTORS = 102400
SHARED_PORT_MAX_FILE_DESCRIPTORS = 102400


Thanks & Regards,
Vikrant Aggarwal


On Thu, Dec 30, 2021 at 4:35 PM Sever, Krunoslav <krunoslav.sever@xxxxxxx> wrote:
Hi,

at the moment I am investigating a case in which a scheduler delays starting a shadow for unknown reasons.

The job is minimal, no file transfers, executes just some echo statements - as soon as the job is actually started, it is done immendiately.

When submitted to another sched (should have same config), there is consistently no delay.

>From the logs, which are currently still on default level I see:

* job is submitted to and transformed on sched
* negotiator matches the job to a worker a short time later

In the working case the shadow is started on the sched without any delay.

Not so on the sched I am looking at

* the job is in running state according to condor_q
* the job ad has no mention of the matched worker node (yet)
* on the worker I find nothing about the job id in the logs

I have actually a job id right now where I am waiting for the shadow to start with loglevel increased to D_FULLDEBUG but no output for the job yet - but the increase happened after matching, might have missed the interesting stuff.

Do you have any ideas what could cause this behavior?

Best
  Kruno

--
------------------------------------------------------------------------
Krunoslav Sever            Deutsches Elektronen-Synchrotron (IT-Systems)
                        Ein Forschungszentrum der Helmholtz-Gemeinschaft
                                                            Notkestr. 85
phone:  +49-40-8998-1648                                   22607 Hamburg
e-mail: krunoslav.sever@xxxxxxx                                  Germany
------------------------------------------------------------------------
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/