[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Easy Setup Guide



I inherited a cluster of condor machines.  I don't know anything about condor and I have a mess on my hands.

Is there access to an easy setup guide to just set up a simple, no-nonsense, nothing special, just the basics cluster?

We have 3 windows hosts and a linux host, and I'm having all kinds of issues.  Some of them I have solved, others still exist.

1) Can't submit jobs on linux
rholloway@rebelbase:~$ condor_submit submit
Submitting job(s)
ERROR: Failed to connect to local queue manager
CEDAR:6001:Failed to connect to <127.0.1.1:60211>

2) Jobs get submitted to the cluster and then show up as "held" and never do anything.

3) I get all kinds of errors in the Collector Log on what is supposed to be the master:

03/30 12:55:54 DC_AUTHENTICATE: attempt to open invalid session Dagobah:1228:1333129419:23, failing.
03/30 12:55:54 DC_AUTHENTICATE: attempt to open invalid session Dagobah:1228:1333129427:24, failing.
03/30 12:55:54 DC_AUTHENTICATE: attempt to open invalid session Dagobah:1228:1333129427:24, failing.
03/30 12:55:54 DC_AUTHENTICATE: attempt to open invalid session Dagobah:1228:1333129427:24, failing.
03/30 12:55:54 DC_AUTHENTICATE: attempt to open invalid session Dagobah:1228:1333129427:24, failing.
03/30 12:55:54 DC_AUTHENTICATE: attempt to open invalid session Dagobah:1228:1333129427:24, failing.
03/30 12:55:54 DC_AUTHENTICATE: attempt to open invalid session Dagobah:1228:1333129427:24, failing.
03/30 12:55:54 DC_AUTHENTICATE: attempt to open invalid session Dagobah:1228:1333129427:24, failing.
03/30 12:55:54 DC_AUTHENTICATE: attempt to open invalid session Dagobah:1228:1333129427:24, failing.

03/30 12:55:56 Failed to send DC_INVALIDATE_KEY to daemon at <127.0.1.1:53521>: SECMAN:2003:TCP connection to daemon at <127.0.1.1:53521> failed.
03/30 12:55:56 Failed to send DC_INVALIDATE_KEY to daemon at <127.0.1.1:53521>: SECMAN:2003:TCP connection to daemon at <127.0.1.1:53521> failed.
03/30 12:55:56 Failed to send DC_INVALIDATE_KEY to daemon at <127.0.1.1:53521>: SECMAN:2003:TCP connection to daemon at <127.0.1.1:53521> failed.
03/30 12:55:56 Failed to send DC_INVALIDATE_KEY to daemon at <127.0.1.1:53521>: SECMAN:2003:TCP connection to daemon at <127.0.1.1:53521> failed.
03/30 12:55:56 Failed to send DC_INVALIDATE_KEY to daemon at <127.0.1.1:53521>: SECMAN:2003:TCP connection to daemon at <127.0.1.1:53521> failed.
03/30 12:55:56 Failed to send DC_INVALIDATE_KEY to daemon at <127.0.1.1:53521>: SECMAN:2003:TCP connection to daemon at <127.0.1.1:53521> failed.
03/30 12:55:56 Failed to send DC_INVALIDATE_KEY to daemon at <127.0.1.1:53521>: SECMAN:2003:TCP connection to daemon at <127.0.1.1:53521> failed.


But I have no problems joining other machines to the cluster. 

And if there are any contractors out there that do this for a living, I'll even pay to have someone fix the environment.  We just need it to work. 

If I shouldn't be running this cluster at all and have no business doing it, I'll accept that as an answer as well.

--
<script language="_javascript_">
                action = "">                 user = "spuds1"
                connector = "@"
                domain = "gmail.com"
                emailAddr= "Email Spuds"
                document.write("<A HREF="" + action + user + connector + domain + ">"+ emailAddr +"</A>")
        </script>