[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] HIGHPORT and LOWPORT



Andrey Kaliazin wrote:
To follow up this issue -

There is something else, which could be limiting the number of running jobs.
I have stumbled on this as well.
For example: I have HIGHPORT = 29000 and LOWPORT = 9600 on the Condor master server
(RH9, 6.7.2) and on submit hosts (Windows XP SP1, SP2, RH9) and about 100 workstations online at the moment (Windows XP SP1, Condor 6.7.2) out of
which about 60 are idle and unclaimed. And yet, not more than around 20 jobs
can run on the whole pool at a time. If I or other user submit say 100 jobs,
the Condor would match quickly first 20 of them and then says (NegotiatorLog):
...
1/15 18:46:51 Matched 303.19 Kaliazia@xxxxxxxxxxx <134.151.145.3:9666>
preempting none <134.151.149.134:9609>
1/15 18:46:51 Successfully matched with cs-357pc04.aston.ac.uk
1/15 18:46:51 Got NO_MORE_JOBS; done negotiating
1/15 18:46:51 ---------- Finished Negotiation Cycle ----------


(While there are 80 more apparently)
After that negotiator would continue to match jobs only to replace those
which ended up successfully.


After some investigation I have found that there were several workstations
in the
pool which negotiator was trying to connect to persistently, but without
success. As soon as I rebooted those machines one by one, negotiator would happily go
ahead and stumble on the next one. After restarting all those rogue PCs,
negotiator
easily managed to match all available resources.


All those PCs are running Condor (for Windows) versions 6.7.2 and 6.7.3 and
have
identical config files. The only suspicion I have about what went wrong with
those
machines is that their ports were out of allowed range (which is 9600-9700.)
Here is an example (from NegotiatorLog) -

1/15 19:47:17     Request 00182.00010:
1/15 19:50:26 Can't connect to <134.151.149.224:9595>:0, errno = 110
1/15 19:50:26 Will keep trying for 10 seconds...
1/15 19:50:27 Connect failed for 10 seconds; returning FALSE
1/15 19:50:27 ERROR: SECMAN:2003:TCP connection to <134.151.149.224:9595>
failed

9595 port is clearly out of range, but I have no idea how and why it
happens.

Could Condor developers please shed some light?

I would recommand you to send a rust to condor-admin@xxxxxxxxxxxx Please make sure that you send a log file of the machine that binds sockets to addresses out of the range. You should set D_NETWORK and D_FULLDEBUG to collect enough information.


Is it possible to force Condor daemons to use fixed ports?
And if this situation occurs again, how to make negotiator to skip the rogue
element
and try another node?

Thanks,

Andrey Kaliazin, Computer Officer
Computer Science, Aston University, Birmingham, UK



-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Se-Chang Son
Sent: Friday, January 14, 2005 8:21 PM
To: Condor-Users Mail List
Cc: Karen Miller
Subject: Re: [Condor-users] HIGHPORT and LOWPORT


Masao Fujinaga wrote:

Does setting the port range using
HIGHPORT = 9700
LOWPORT = 9600
limit the number of jobs that can be running? With the

above limits, I


could only get about 40 jobs to run. With the limits

removed, I was able


to run more (62, the number of machines that I had available).

Yes. Each job requires two addresses plus several fixed number of addresses per submit machine. Therefore, you pretty much hit the wall with about 40 jobs. Karen Miller is adding, in the manual, stuff that explains how big the address range must be.



Masao

--
Masao Fujinaga | Research Computing Support
fujinaga@xxxxxxxxxxx | Computing and Network Services
Tel.: (780) 492-2117 | University of Alberta
Fax.: (780) 492-1729 | Edmonton, Alberta, CANADA




-------------------------------------------------------------- ----------

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users

_______________________________________________ Condor-users mailing list Condor-users@xxxxxxxxxxx http://lists.cs.wisc.edu/mailman/listinfo/condor-users



_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users