[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Shadow exception with LamMpi jobs




000 (18789.000.000) 03/14 10:55:35 Job submitted from host: <10.7.7.250:55139>
...
014 (18789.000.000) 03/14 10:58:10 Node 0 executing on host: <10.7.7.20:59381>
...
014 (18789.000.001) 03/14 10:58:28 Node 1 executing on host: <10.7.7.11:59230>
...
[many lines later]
...
007 (18789.000.000) 03/14 11:06:49 Shadow exception!
        Error from starter on slot3@xxxxxxxxxxxxxx: Failed to transfer files
        0  -  Run Bytes Sent By Job
        47481929728  -  Run Bytes Received By Job


Try setting in the config file

STARTER_UPLOAD_TIMEOUT = 1200

or set it to another large value, and see if the problem goes away.

-greg