[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] connection issues?



Hi,
We have a pool of 12 Windows machines running Condor 6.6.8 with one of them as the central manager. They share the same config file stored on the network.
 
We're trying to use condor at work, but I am tending to run into the same problem. In the case of a submission of, say 30 jobs, many of them fail. They tend to fail on remote machines, where, for example, we see the following in the StartLog (on the remote host) files:
 
4/6 13:24:11 vm1: Changing state and activity: Claimed/Busy -> Preempting/Vacating
4/6 13:24:12 Can't connect to <10.10.30.60:1685>:0, errno = 10061
4/6 13:24:12 Will keep trying for 10 seconds...
4/6 13:24:21 Connect failed for 10 seconds; returning FALSE
4/6 13:24:21 ERROR:
SECMAN:2003:TCP connection to <10.10.30.60:1685> failed
 
While the Starter Log has:
/6 13:23:57 ******************************************************
4/6 13:23:57 ** condor_starter (CONDOR_STARTER) STARTING UP
4/6 13:23:57 ** C:\Condor\bin\condor_starter.exe
4/6 13:23:57 ** $CondorVersion: 6.6.8 Jan 31 2005 $
4/6 13:23:57 ** $CondorPlatform: INTEL-WINNT40 $
4/6 13:23:57 ** PID = 3236
4/6 13:23:57 ******************************************************
4/6 13:23:57 Using config file: //homer/india/condor_config
4/6 13:23:57 Using local config files: C:\Condor/condor_config.local
4/6 13:23:57 DaemonCore: Command Socket at <10.10.30.60:1672>
4/6 13:23:57 Setting resource limits not implemented!
4/6 13:23:58 Starter communicating with condor_shadow <10.10.30.24:4804>
4/6 13:23:58 Submitting machine is "med2.fsca.local"
4/6 13:23:58 DynUser: MultiByteToWideChar() failed error=1113
4/6 13:23:58 ERROR "Unexpected failure in dynuser:update_t
" at line 472 in file ..\src\condor_c++_util\dynuser.C
4/6 13:23:58 ShutdownFast all jobs.
 
What does dynuser do and what does this error mean?
 
Our jobs are submitted via a dagman.
 
Many thanks for any responses,

Jonathan