[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Negotiation Cycle between Linux master and WindowsXP pool



Hello,

In order to learn the condor configuration, I have set up a a mini-condor pool in my office
with an Intel/Linux PC as the central master, and a single Intel/WindowsXP PC in the pool.
The Linux / Windows PCs have IPs 125.125.120.72,  125.125.120.71, respectively

I have installed Condor on Windows XP, in the recommended "UWCS" configuration.
On Linux, Condor comes from the precompiled rpm package provided by the yum repository.

In the local config file, I have configured the Windows pool PC to always run condor jobs.
When I submit a job, I expected it would run right away, but it doesn't. See details below.
I'm not sure why the job is not ran; in the NegotiatorLog file there is the 127.0.0.1 IP numbers
as the IP of the submitter. Is that causing the trouble?
Do I have to add 127.0.0.1 somewhere in the local config files?

I hope someone can point out where I should look for solving this problem!
Thanks!


The local configuration files on the two machines are:
# Linux master:
CONDOR_DEVELOPERS = NONE
COLLECTOR_NAME = Library Pool
COLLECTOR_HOST  = $(FULL_HOSTNAME)
DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD
NEGOTIATOR_INTERVAL = 20
TRUST_UID_DOMAIN = TRUE
HOSTALLOW_WRITE    = *
HOSTALLOW_READ = *
LOWPORT = 9600
HIGHPORT = 9700


# Windows pool PC
COLLECTOR_NAME = Library Pool
HOSTALLOW_WRITE = *
HOSTALLOW_READ = *
DAEMON_LIST = MASTER STARTD
HOSTALLOW_ADMINISTRATOR = 125.125.120.72
CONSOLE_DEVICES = mouse, console
LOWPORT = 9600
HIGHPORT = 9700 
WANT_SUSPEND = TRUE
WANT_VACATE = FALSE
START = TRUE
SUSPEND = FALSE
PREEMPT = FALSE



On the master I have 5 condor daemons:
 condor_master
 condor_collector
 condor_negotiator
 condor_schedd
 condor_procd


On the Windows pool PC, there are two condor daemons:
 condor_master.exe
 condor_startd.exe


I get status output on the master:

$ condor_status
Name     OpSys    Arch   State     Activity LoadAv Mem   ActvtyTime
Office   WINNT51  INTEL  Unclaimed Idle     0.020   767  0+00:17:53


I made this submit job:

#########
Requirements = (Arch == "INTEL") && (OpSys == "WINNT51") && (HasFileTransfer)
Universe = vanilla
Executable = helloworld.exe
output = helloworld.out
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
Queue
#########

$ condor_q
-- Submitter: localhost.localdomain : <127.0.0.1:9623> : localhost.localdomain
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
   2.0   greg          2/13 11:11   0+00:00:00 I  0   0.0  helloworld.exe    

1 jobs; 1 idle, 0 running, 0 held

$ condor_q -analyze 2.0  
---
002.000:  Run analysis summary.  Of 4 machines,
      0 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      4 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job


In the NegotiatorLog file the repetitive negotiation cycles complain about some kind
of read/write error, when negotiating with 127.0.0.1...:

2/13 14:11:09 ---------- Started Negotiation Cycle ----------
2/13 14:11:09 Phase 1:  Obtaining ads from collector ...
2/13 14:11:09   Getting all public ads ...
2/13 14:11:09   Sorting 10 ads ...
2/13 14:11:09   Getting startd private ads ...
2/13 14:11:09 Got ads: 10 public and 4 private
2/13 14:11:09 Public ads include 1 submitter, 4 startd
2/13 14:11:09 Phase 2:  Performing accounting ...
2/13 14:11:09 Phase 3:  Sorting submitter ads by priority ...
2/13 14:11:09 Phase 4.1:  Negotiating with schedds ...
2/13 14:11:09   Negotiating with lahaye@xxxxxxxxxxxxxxxxxxxxx at <127.0.0.1:9623>
2/13 14:11:09 0 seconds so far
2/13 14:11:09 condor_read(): recv() returned -1, errno = 104, assuming failure reading 5 bytes from unknown source.
2/13 14:11:09 IO: Failed to read packet header
2/13 14:11:09     Failed to get reply from schedd
2/13 14:11:09   Error: Ignoring schedd for this cycle
2/13 14:11:09 ---------- Finished Negotiation Cycle ----------


---
Rob.