[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MATLAB Distributed Computing Server and Condor



On Thu, Sep 08, 2011 at 01:20:52PM -0700, Mark Cafaro wrote:
:There was a brief thread about this in 2008, but times have changed and so has the MATLAB Distributed Computing Server (MDCS).
:
:Has anyone had success using MCDS on top of Condor through the generic scheduler option? We are currently evaluating a trial version of MDCS to see if this possibility exists.

yes.

:I have noticed a condor specific SubmitFcn in the MCDS examples, so it appears some customers have been exploring this option as well.

A few years back Mathworks wrote a SubmitFcn for us wich is probably
the basis for that example.  It does work but is fragile compared to
most condor things.

If any job fails or is interrupted Matlab will just hang forever
waiting for it.

If any of the jobs can't get a DCE license it will fail and Matlab
will hang forever waiting for it.  This means you need to use resource
limits in condor and configure your submit function to require that
limit.  It also means if you have systems outside condor using the
same DCE license pool it's very likely things will occasionally break
in weird ways.

we also have a situation where on apparently identical systems within
our pool some system deterministically segfault when DCE tries to
start a matlab process though running matlab by hand (witht he same
commandline DCE uses) succeeds.  This turned up about a year ago and
Mathworks assigned an engineer ot look into it but has yet to come to
a determination.  We work around this by having a negative "requires"
statement in teh submit function that excludes nodes by name that have
displayed this bug.

so yes you "can" 

-Jon