Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] GCB problem
- Date: Thu, 12 Jul 2007 18:49:05 +0200
- From: "José M. Martín" <jmartin@xxxxxxxxxxxxxx>
- Subject: [Condor-users] GCB problem
Hello,
I am trying to set up a pool with execute nodes are in a private network. I
have a GCB Broker which has 2 NICs (one to the public network, one to the
private network). Central Manager has also 2 NICs.
I can submit jobs from a remote pool but the jobs only run in the Central
Manager (also Execute Node). Jobs match with private execute nodes, but can't
run.
I think the problem is GCB configuration, but I am not sure
Log private execute node
-------------------------------------------------------------------------------------
7/12 18:42:14 DaemonCore: Command received via UDP from host
<192.168.2.1:9673>
7/12 18:42:14 DaemonCore: received command 440 (MATCH_INFO), calling handler
(command_match_info)
7/12 18:42:14 match_info called
7/12 18:42:14 Received match <150.214.102.58:1390>#1184256975#5#...
7/12 18:42:14 State change: match notification protocol successful
7/12 18:42:14 Changing state: Unclaimed -> Matched
7/12 18:44:14 State change: match timed out
7/12 18:44:14 Changing state: Matched -> Owner
7/12 18:44:14 State change: IS_OWNER is false
7/12 18:44:14 Changing state: Owner -> Unclaimed
----------------------------------------------------------------------------------
Log submit machine
----------------------------------------------------------------------------------
7/12 18:41:59 (pid:24667) Activity on stashed negotiator socket
7/12 18:41:59 (pid:24667) Negotiating for owner: jmartin@xxxxxx
7/12 18:41:59 (pid:24667) Checking consistency running and runnable jobs
7/12 18:41:59 (pid:24667) Tables are consistent
7/12 18:41:59 (pid:24667) Out of jobs - 1 jobs matched, 0 jobs idle, flock
level = 1
7/12 18:41:59 (pid:24667) Sent ad to central manager for jmartin@xxxxxx
7/12 18:41:59 (pid:24667) Sent ad to 1 collectors for jmartin@xxxxxx
7/12 18:41:59 (pid:24667) condor_write(): Socket closed when trying to write
506 bytes to <150.214.102.58:1390>, fd is 12
7/12 18:41:59 (pid:24667) Buf::write(): condor_write() failed
7/12 18:41:59 (pid:24667) SECMAN: failed to end classad message
7/12 18:41:59 (pid:24667) ERROR: SECMAN:2007:Failed to end classad message
7/12 18:41:59 (pid:24667) Couldn't send REQUEST_CLAIM to startd at
<150.214.102.58:1390>
7/12 18:41:59 (pid:24667) condor_read(): recv() returned -1, errno = 104,
assuming failure reading 5 bytes from unknown source.
7/12 18:41:59 (pid:24667) Sent RELEASE_CLAIM to startd on
<150.214.102.58:1390>
7/12 18:41:59 (pid:24667) Match record (<150.214.102.58:1390>, 30, 0) deleted
7/12 18:42:07 (pid:24667) Activity on stashed negotiator socket
7/12 18:42:07 (pid:24667) Negotiating for owner: jmartin@xxxxxx
7/12 18:42:07 (pid:24667) Checking consistency running and runnable jobs
7/12 18:42:07 (pid:24667) Tables are consistent
7/12 18:42:07 (pid:24667) Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs
rejected
7/12 18:43:38 (pid:24667) DaemonCore: Command received via TCP from host
<150.214.102.146:9643>
7/12 18:43:38 (pid:24667) DaemonCore: received command 443 (VACATE_SERVICE),
calling handler (vacate_service)
7/12 18:43:38 (pid:24667) Got VACATE_SERVICE from <150.214.102.146:9643>
7/12 18:43:38 (pid:24667) Sent RELEASE_CLAIM to startd on
<150.214.102.146:9700>
7/12 18:43:38 (pid:24667) Match record (<150.214.102.146:9700>, 29, 0) deleted
7/12 18:43:38 (pid:24667) Shadow pid 27242 for job 29.0 exited with status 107
--------------------------------------------------------------------------------------------------------------
If you need more data, only ask for it.
Sorry for my awfull English.
Help, please!!
-----------
José M. Martin