[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MATLAB Distributed Computing Server and Condor



I am sure most of you already know this, but just in case ...
You do not have to use MCDS to enable Matlab programs to harness the CPU power of your Condor pool. You can just run many individual Matlab invocations in parallel by submission into the vanilla universe just like any other job. The only real non-obvious hurdle to doing this is how to deal w/ licenses. See
  https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToRunMatlab
for some specific notes related to using Condor and Matlab together. Jon, if you are interested/willing to improve the above HowTo (to include MCDS wisdom perhaps?), that'd be awesome.

Thanks
Todd

Jonathan D. Proulx wrote:
On Thu, Sep 08, 2011 at 01:20:52PM -0700, Mark Cafaro wrote:
:There was a brief thread about this in 2008, but times have changed and so has the MATLAB Distributed Computing Server (MDCS).
:
:Has anyone had success using MCDS on top of Condor through the generic scheduler option? We are currently evaluating a trial version of MDCS to see if this possibility exists.

yes.

:I have noticed a condor specific SubmitFcn in the MCDS examples, so it appears some customers have been exploring this option as well.

A few years back Mathworks wrote a SubmitFcn for us wich is probably
the basis for that example.  It does work but is fragile compared to
most condor things.

If any job fails or is interrupted Matlab will just hang forever
waiting for it.

If any of the jobs can't get a DCE license it will fail and Matlab
will hang forever waiting for it.  This means you need to use resource
limits in condor and configure your submit function to require that
limit.  It also means if you have systems outside condor using the
same DCE license pool it's very likely things will occasionally break
in weird ways.

we also have a situation where on apparently identical systems within
our pool some system deterministically segfault when DCE tries to
start a matlab process though running matlab by hand (witht he same
commandline DCE uses) succeeds.  This turned up about a year ago and
Mathworks assigned an engineer ot look into it but has yet to come to
a determination.  We work around this by having a negative "requires"
statement in teh submit function that excludes nodes by name that have
displayed this bug.

so yes you "can"
-Jon
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/