I never saw an answer to this question. Did one get proffered off the list? Could you please cross post it if that is the case. I too am curious about this delay as I'm seeing this in my flock of Windows XP machines.
Thanks!
Ian
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Marc Saric
Sent: August 31, 2004 10:26 AM
To: condor-users@xxxxxxxxxxx
Subject: [Condor-users] Condor job submission delayed
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi all,
I am experimenting with a small Condor cluster (Condor 6.6.6, mostly on Windows-boxes unfortunately) as you can see from my various beginners mails popping up in the forum.
I have set up a bunch of Windows-machines (Win2k SP6 and WinXP Pro SP1) and a central Linux-Master-Server.
Submission of jobs works in principle (tested it with the hello-world-examples from http://www.liv.ac.uk/e-science/condor/hello.html
but sometimes I observe a strange behaviour in that certain jobs need a very long time until they are beeing executed.
This happens while most of the machines are not busy and are listed as availabe (15 min no user + low CPU-utilization).
"condor_status" gives something like:
saric@u-191-srv2:~/tmp> condor_status
Name OpSys Arch State Activity LoadAv Mem
ActvtyTime
u-191-srv2.pr LINUX INTEL Unclaimed Idle 0.010 1004
0+01:52:13
u-099-cpc-esi WINNT50 INTEL Owner Idle 0.240 512
0+01:16:34
vm1@u-099-csr WINNT50 INTEL Claimed Busy 0.000 1024
0+00:10:56
vm2@u-099-csr WINNT50 INTEL Unclaimed Idle 0.000 1024
0+01:43:03
u-099-cbb1 WINNT51 INTEL Unclaimed Idle 0.000 511
0+01:46:27
u-099-cnb2 WINNT51 INTEL Owner Idle 0.020 511
0+04:31:59
u-099-cpc-sek WINNT51 INTEL Owner Idle 0.040 512
0+00:10:14
u-099-cpc1 WINNT51 INTEL Owner Idle 0.000 512
0+00:06:20
u-099-cpc2 WINNT51 INTEL Owner Idle 0.030 512
0+00:01:20
u-099-cpc3 WINNT51 INTEL Unclaimed Idle 0.000 512
0+00:06:21
u-099-cpc4 WINNT51 INTEL Owner Idle -0.010 512
0+04:57:30
u-099-cpc5 WINNT51 INTEL Unclaimed Idle 0.000 512
0+00:31:21
so there are at least 4 unclaimed machines in the pool which should match requirements ((OpSys == "WINNT50") || (OpSys == "WINNT51"))..
The result of a "condor_q -analyze" takes quite a long time and gives back something like:
045.000: Run analysis summary. Of 12 machines,
~ 1 are rejected by your job's requirements
~ 6 reject your job because of their own requirements
~ 0 match, but are serving users with a better priority in the pool
~ 4 match, match, but reject the job for unknown reasons
~ 1 match, but will not currently preempt their existing job
~ 0 are available to run your job
I can't see why the 4 should reject for unknown reasons. Is there any place where I could look at to find out these unknown reasons (systemlog, local condor-log on machines???).
Thanks in advance!
- --
Bye,
Marc Saric
Dr. Marc Saric, Bioinformatik, Proteom Centrum Tübingen,
Auf der Morgenstelle 15, D-72076 Tübingen, Germany,
Tel: +49 (0)7071 29 70557, marc.saric@xxxxxxxxxxxxxxxx http://www.proteom-centrum-tuebingen.de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFBNIqQBLD6PjSWyL4RAlKLAJ4l64RE870+vfqESQJL5Cz5oMSGjQCbBmA6
WLrzxNGTr1sGB3oJv4bDW48=
=nKWt
-----END PGP SIGNATURE----- _______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx http://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users