[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Matlab Compiler and Condor: Shadow Exception



Hi there,

 

Actually I?m trying to combine MATLAB with Condor and its nearly working now. There is only one thing I couldn?t solve:

 

If I distribute the compiled Matlab-*.exe file, all nodes, except the central manager, couldn?t compute/return their results. In the job log file (remote node), there is an entry (no entry in *.err or *.out):

 

001 (019.000.000) 12/19 10:26:25 Job executing on host: <192.168.0.8:1029>

...

007 (019.000.000) 12/19 10:26:25 Shadow exception!

            Can no longer talk to condor_starter on execute machine (192.168.0.8)

            0  -  Run Bytes Sent By Job

            46163  -  Run Bytes Received By Job

 

All other jobs (helloWorld.exe), so no MATLAB jobs, run perfectly! And the MATLAB *.exe itself runs without any problem on the remote nodes (manually started, without Condor) as well.

 

How I performed it:

 

Central Manager: WinXP (WinNT51), shared MATLAB directory (for the Clients)

 

Remote Nodes: Win2000 (WinNT50), network acces to shared MATLAB directory.

 

  1. compiling files, create input.mat
  2. submitting files
  3. the central manager could compute it without an exceptions
  4. the remote nodes return job states to idle, after a few seconds
  5. after that (in normal case) collect output.mat files and display them in MATLAB

 

I don?t know if there?s a problem, because the Matlab compiled job needs to extract an archive. This takes a while, and then the computation starts.

 

I hope that some of you solved such problems before and can give me the solution for that problem ;-)

 

All the Best,

Tobias