[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor job submission delayed



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ralf Reinhardt wrote:

| What do you mean by very long time?

Around 30 min.


| 300 second delays can occur if the new job started while condor was | within a 20 seconds frame of the negotiation cycle. you can start the | job by using condor_reschedule. | you can reduce the time by lowering the NEGOTIATOR_INTERVAL value, but | the 20 seconds timeframe is fixed, so for a 60 second interval you have | a 33% chance that your job must wait up to a minute.

That's clear to me, I did not expect the queued jobs to be executed
within a few seconds, but (recaling my first mail) it was unclear to me,
why sometimes (not allways) jobs don't get executed for 10-30 minutes
while "condor_status" lists a lot of machines (all of them Windows in my
case) as Unclaimed/Idle during that period.

"condor_q -analyze" just tells me, that those machines match, but reject
jobs due to unknown reasons (any chance to evaluate, what those reasons
might be?).

The policy is the UWCS-schema as installed by the windows-installer
(pretty standard, i.e. Available/Unclaimed/Idle if 15 min no
keyboard/mouse input and low cpu-load). On one server (2CPU-WIN2k SP6) I
have configured Condor to use the TESTINGMODE-settings, so at least that
machine should acceppt submissions immediately (but it does not).

Machines show up in condor_status, the queue can be queried with
condor_q and the machines execute batch-jobs ok. The only glitch is,
that I can't submit from Windows machines due to a faulty
"condor_store_cred add" command (see my other mail from 2004-08-31).

Maybe you could point me to the right log-file/debug-setting to dig
deeper into this issues.


| If you have to wait for 30 minutes, it would point to a more serious | problem in the negotiation between master and clients.

That seems to be the case, I think. But it is not consistent and I don't
know where to look for hints on what might be wrong.

If you want, I could send submit-files and parts (or all) of the
logfiles in question for further analysis by people who actually
understand Condor. :-)

Thanks.

- --
Bye,
Marc Saric

Dr. Marc Saric, Bioinformatik, Proteom Centrum Tübingen,
Auf der Morgenstelle 15, D-72076 Tübingen, Germany,
Tel: +49 (0)7071 29 70557, marc.saric@xxxxxxxxxxxxxxxx
http://www.proteom-centrum-tuebingen.de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFBNwQFBLD6PjSWyL4RAr/hAJ9a4+9TQWU6PGEZ/O8nwmSP/u+XogCgngft
OnSFdaKCSkPmgGJSn8wwKYI=
=GNsX
-----END PGP SIGNATURE-----