[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[condor-users] only first jobs will run



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


This is a rather odd problem, but we've been fighting it for some time.

When we start condor on our master, and then submit a job (on the same 
machine), and queue queue it xx number of times (let's say 100) that first 
batch runs fine.   If someone then comes along and tries to submit another 
job, even if it is identical, it doesn't work.  just sits in the queue 
forever.  The ScheddLog shows the following:

10/2 16:31:53 Sent ad to central manager for cshields@xxxxxxxxxxxxxxxxxxx
10/2 16:31:53 Called reschedule_negotiator()
10/2 16:31:56 Activity on stashed negotiator socket
10/2 16:31:56 Negotiating for owner: cshields@xxxxxxxxxxxxxxxxxxx
10/2 16:31:56 Checking consistency running and runnable jobs
10/2 16:31:56 Tables are consistent
10/2 16:32:16 Can't receive request from manager

This "Can't receive request from manager" seems to be our clue, yet we are 
clueless.

We've upgraded a couple of times and are now at the latest condor (6.5.5) 
statically linked on RedHat 8.0

Any and all help is greatly appreciated!

Cheers,

- -Corey	

- -- 
Corey Shields - IU Unix Systems Support Group
http://ussg.iu.edu/~cshields

My PGP/GPG public encryption key is at:
http://www.ussg.iu.edu/~cshields/cshields_pub_key.asc
GPG fingerprint: 78A8 E5EB E455 0A90 F392 59BC A6AF F8A3 A304 1453
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/fJn5pq/4o6MEFFMRAsieAJ9euRRxY4/40u8RdOy01Eb5dqMJngCeKN5y
BXkN2Q/i83m/lg4wu0OgQbM=
=4GwU
-----END PGP SIGNATURE-----

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>