000 (18789.000.000) 03/14 10:55:35 Job submitted from host: <10.7.7.250:55139> ... 014 (18789.000.000) 03/14 10:58:10 Node 0 executing on host: <10.7.7.20:59381> ... 014 (18789.000.001) 03/14 10:58:28 Node 1 executing on host: <10.7.7.11:59230> ... [many lines later] ... 007 (18789.000.000) 03/14 11:06:49 Shadow exception! Error from starter on slot3@xxxxxxxxxxxxxx: Failed to transfer files 0 - Run Bytes Sent By Job 47481929728 - Run Bytes Received By Job
Try setting in the config file STARTER_UPLOAD_TIMEOUT = 1200 or set it to another large value, and see if the problem goes away. -greg