Hi HTCondor experts,
In our HTCondor cluster, most jobs are submitted as cron DAGman jobs. Sometimes, hundreds of DAGman jobs may be submitted at the same time. We see that there are always 200 DAGman jobs running, several hundreds of DAGman jobs idle and about 120 ~ 230 HTCondor jobs running even though there are slots and machine available.
In our configurations, all the DAGman throttling macros DAGMAN_MAX_JOBS_IDLE, DAGMAN_MAX_JOBS_SUBMITTED, DAGMAN_MAX_PRE_SCRIPTS and DAGMAN_MAX_POST_SCRIPTS are set to 0. And we do not have STARTD_CRON_MAX_JOB_LOAD, SCHEDD_CRON_MAX_JOB_LOADÂandÂBENCHMARKS_MAX_JOB_LOAD defined.
My questions are if there are any other configurations that limit
the number of DAGman jobs running, and what could cause only 200
DAGman jobs running when there are machines unclaimed.
Thank you in advance,
-- Zhuo Zhang, ASSISTT I.M. Systems Group (IMSG), NOAA/NESDIS/STAR 5825 University Research Court, Suite 1500 (IMSG), Cube 1500-11 College Park, MD 20740 Tel: (240) 582-3585 (x23017)