[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Upgrade problem



Hi, I just upgraded a windows cluster from condor 6.6.11 to 6.7.19 (trying to use the new groups abilities). All the job on the cluster use a dedicated scheduler and are MPI. Any Idea why jobs will not run now? They just constantly cycle from running to Idle. The shadow log has the following errors, but I'm not sure what this means?

6/12 17:21:29 Using config file: C:\Condor\condor_config
6/12 17:21:29 Using local config files: C:\Condor/condor_config.local
6/12 17:21:29 DaemonCore: Command Socket at <10.0.0.1:4373>
6/12 17:21:29 Initializing a MPI shadow for job 8666.0
6/12 17:21:29 (8663.0) (4896): condor_read(): recv() returned -1, errno = 10054, assuming failure.
6/12 17:21:29 (8663.0) (4896): IO: Failed to read packet header
6/12 17:21:29 (8663.0) (4896): ERROR "Can no longer talk to condor_starter <10.0.0.10:1040>" at line 93 in file ..\src\condor_shadow.V6.1\NTreceivers.C
6/12 17:21:29 (8662.0) (4532): condor_read(): timeout reading buffer.
6/12 17:21:29 (8662.0) (4532): IO: Failed to read packet header
6/12 17:21:29 (8665.0) (1624): condor_read(): recv() returned -1, errno = 10054, assuming failure.
6/12 17:21:29 (8665.0) (1624): IO: Failed to read packet header
6/12 17:21:29 (8665.0) (1624): ERROR "Can no longer talk to condor_starter <10.0.0.2:1040>" at line 93 in file ..\src\condor_shadow.V6.1\NTreceivers.C