On Monday, April 18, 2011 at 8:35 AM, yap munsoon wrote:
Thanks for your reply Ian.
While the process is hung, I tried to cp the same files from the UNC share using the exact same MSYS cp.exe on a cmd prompt. I am able to cp successfully.
What if you log in as the same user the job is running as? I believe the logged-in instance is swbatch and the jobs are run as swbatch1. Still able to copy it if you log in as swbatch1?
The machines resources look ok too. Attached is the snapshot of the system info.
What about desktop heap though? Is that exhausted? Jeff in Toronto can help you get that number from a machine -- it's generally set pretty low for the background desktop instances that get used by Condor jobs.
Btw, there is no problem on WIndows7 or Windows Server 2008 R2.
Check the heap sizes on these machines against the XP machines. That might show a difference. Could be too, that the new OSes are just better at handling a lot of concurrent and repetitive network I/O like you end up generating with 8 jobs running cp commands in parallel.