[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] GCB problem



Hello,

I am trying to set up a pool with execute nodes are in a private network. I 
have a GCB Broker which has 2 NICs (one to the public network, one to the 
private network). Central Manager has also 2 NICs.

I can submit jobs from a remote pool but the jobs only run in the Central 
Manager (also Execute Node). Jobs match with private execute nodes, but can't 
run.

I think the problem is GCB configuration, but I am not sure


Log private execute node 
-------------------------------------------------------------------------------------
7/12 18:42:14 DaemonCore: Command received via UDP from host 
<192.168.2.1:9673>
7/12 18:42:14 DaemonCore: received command 440 (MATCH_INFO), calling handler 
(command_match_info)
7/12 18:42:14 match_info called
7/12 18:42:14 Received match <150.214.102.58:1390>#1184256975#5#...
7/12 18:42:14 State change: match notification protocol successful
7/12 18:42:14 Changing state: Unclaimed -> Matched
7/12 18:44:14 State change: match timed out
7/12 18:44:14 Changing state: Matched -> Owner
7/12 18:44:14 State change: IS_OWNER is false
7/12 18:44:14 Changing state: Owner -> Unclaimed
----------------------------------------------------------------------------------
Log submit machine
----------------------------------------------------------------------------------
7/12 18:41:59 (pid:24667) Activity on stashed negotiator socket
7/12 18:41:59 (pid:24667) Negotiating for owner: jmartin@xxxxxx
7/12 18:41:59 (pid:24667) Checking consistency running and runnable jobs
7/12 18:41:59 (pid:24667) Tables are consistent
7/12 18:41:59 (pid:24667) Out of jobs - 1 jobs matched, 0 jobs idle, flock 
level = 1
7/12 18:41:59 (pid:24667) Sent ad to central manager for jmartin@xxxxxx
7/12 18:41:59 (pid:24667) Sent ad to 1 collectors for jmartin@xxxxxx
7/12 18:41:59 (pid:24667) condor_write(): Socket closed when trying to write 
506 bytes to <150.214.102.58:1390>, fd is 12
7/12 18:41:59 (pid:24667) Buf::write(): condor_write() failed
7/12 18:41:59 (pid:24667) SECMAN: failed to end classad message
7/12 18:41:59 (pid:24667) ERROR: SECMAN:2007:Failed to end classad message
7/12 18:41:59 (pid:24667) Couldn't send REQUEST_CLAIM to startd at 
<150.214.102.58:1390>
7/12 18:41:59 (pid:24667) condor_read(): recv() returned -1, errno = 104, 
assuming failure reading 5 bytes from unknown source.
7/12 18:41:59 (pid:24667) Sent RELEASE_CLAIM to startd on 
<150.214.102.58:1390>
7/12 18:41:59 (pid:24667) Match record (<150.214.102.58:1390>, 30, 0) deleted
7/12 18:42:07 (pid:24667) Activity on stashed negotiator socket
7/12 18:42:07 (pid:24667) Negotiating for owner: jmartin@xxxxxx
7/12 18:42:07 (pid:24667) Checking consistency running and runnable jobs
7/12 18:42:07 (pid:24667) Tables are consistent
7/12 18:42:07 (pid:24667) Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs 
rejected
7/12 18:43:38 (pid:24667) DaemonCore: Command received via TCP from host 
<150.214.102.146:9643>
7/12 18:43:38 (pid:24667) DaemonCore: received command 443 (VACATE_SERVICE), 
calling handler (vacate_service)
7/12 18:43:38 (pid:24667) Got VACATE_SERVICE from <150.214.102.146:9643>
7/12 18:43:38 (pid:24667) Sent RELEASE_CLAIM to startd on 
<150.214.102.146:9700>
7/12 18:43:38 (pid:24667) Match record (<150.214.102.146:9700>, 29, 0) deleted
7/12 18:43:38 (pid:24667) Shadow pid 27242 for job 29.0 exited with status 107
--------------------------------------------------------------------------------------------------------------





If you need more data, only ask for it.

Sorry for my awfull English.

Help, please!!

-----------
José M. Martin