[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] ERROR: Failed to connect to local queue manager



03/17/17 17:14:18 SharedPortServer: server was busy, failed to connect 6367_e09c_3140282 as requested by SCHEDD <10.40.31.17:9618?addrs=10.40.31.17-9618&noUDP&sock=6191_b7bb_3> on <10.40.31.17:6516>: primary (fa9978631a59659f8fbed31539c90cfdba79f8f118596e1b13053010c10e1cec/6367_e09c_3140282): Connection refused (111); alt (/var/lock/condor/daemon_sock/6367_e09c_3140282): Connection refused (111)

How do I figure out what's going on here?

It looks like the schedd's too busy, probably dealing with all the submits that just happened, and isn't answering. Check the schedd's log to see what it's up to. If your user is calling condor_submit frequently, it may help to combine submits into one (using the extra queue commands), or -- as a last resort -- use DAGMan to help throttle submits.

- ToddM