[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] slow scheduling of dagman jobs



Hi David,
answers (and more questions) are below

On Wed, Sep 7, 2011 at 3:57 PM, David J. Herzfeld <herzfeldd@xxxxxxxxx> wrote:
Hi Patty:

On Wed, 2011-09-07 at 15:33 -0400, Patty Bragger wrote:
> here are the relevant config settings:
> DAGMAN_MAX_SUBMITS_PER_INTERVAL = 250
> DAGMAN_SUBMIT_DELAY = 0
> DAGMAN_USER_LOG_SCAN_INTERVAL = 5
> (at least I think that's all of them)

> Has anyone else seen performance like this or does anyone know how to
> figure out what is taking it so long to dispatch these nodes to the
> queue?

We are running 7.6.1 on RHEL here, and do not see the issue that you are
describing. Your configuration settings seem fine to me. I assume that
when you say "config settings", you mean that you are using the CONFIG
directive directly in your dag.

the config settings that I was referring to are those defined in the config files for the central manager:
$ condor_config_val -dump | grep -i dag
DAGMAN_MAX_SUBMITS_PER_INTERVAL = 250
DAGMAN_SUBMIT_DELAY = 0
DAGMAN_USER_LOG_SCAN_INTERVAL = 5

I have not tried setting these directly in the job submit file.
 

What operating system are you using?

We are also on RHEL with a mix of v5.4 and v5.5 machines

Perhaps upgrading to condor 7.6.1 will resolve this issue, or perhaps Kent is correct in this being the overhead of submitting jobs. 

But if that is the case, that this is just the overhead of submitting individual jobs, why don't I see the same overhead when submitting 100 separate jobs through a non-dag submit?  What is the difference between submitting a dag file that has 100 entries and submitting a non-dag file that has 100 job submissions (not through queue 100, but through 100 separate job definitions in the file.)  I would expect those to have pretty much the same overhead.

Thanks,
Patty
 

Here's an example that I ran on our RHEL 7.6.1 system, which submits 250
jobs/second (following the initial delay to "ensure ProcessId
uniqueness"):
#!/bin/bash

# Create config file
cat > test.config << EOF
DAGMAN_MAX_SUBMITS_PER_INTERVAL = 250
DAGMAN_SUBMIT_DELAY = 0
DAGMAN_USER_LOG_SCAN_INTERVAL = 5
EOF

# Create submit file
cat > test.sub << EOF
Executable = /bin/echo
Arguments = "Hello World"
transfer_executable = False
Output = out/test_\$(RUN).out
Error = err/test_\$(RUN).err
Log = test.log
Queue
EOF

# Create dag
echo "CONFIG test.config" > test.dag
for i in $(seq 0 250)
do
       echo "JOB A${i} test.sub" >> test.dag
       echo "VARS A${i} RUN=\"${i}\"" >> test.dag
done

# Make out and err directories
rm -rf out err 2>/dev/null
mkdir out err

# Submit!
condor_submit_dag test.dag

> Thanks,
> Patty
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/