[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] anyone with experience with matlab



Title: Message
OK, I've included a sample submit description file.  You'll notice that condor_wait doesn't go in the submit description file.  It's a seperate command.  When you submit the job you'll get something like this.
 
V:\shared\condor\sample_jobs\bludaa931717>condor_submit.exe example.job
Submitting job(s).....
Logging submit event(s).....
5 job(s) submitted to cluster 491.
 
In order to wait for the job to complete you need to use the condor wait command on the log file listed in the submit description file.  It's pointed to v:\temp\condor\dir.log.  Below is a transcript of what I did.  The only thing necessary for you is to use condor_wait.
 
V:\temp\condor>pwd
V:/temp/condor
 
V:\temp\condor>dir/w
 Volume in drive V is DATA
 Volume Serial Number is FC55-1736
 
 Directory of V:\temp\condor
 
[.]                [..]               dir.491.0.error    dir.491.0.output
dir.491.1.error    dir.491.1.output   dir.491.2.error    dir.491.2.output
dir.491.3.error    dir.491.3.output   dir.491.4.error    dir.491.4.output
dir.log
              11 File(s)      3,969,255 bytes
               2 Dir(s)  17,617,494,016 bytes free
 
V:\temp\condor>condor_wait.exe dir.log
All jobs done.
 
V:\temp\condor>
 
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Richardson, Joshua
Sent: Friday, August 03, 2007 06:46
To: Condor-Users Mail List
Subject: Re: [Condor-users] anyone with experience with matlab

So just making sure I understand you correctly, when I set the log = log file, instead of just naming the log file, give the full path of the log file. Then as the last line (before the queue? Or after?) add condor_wait with the path of the log file?

 

Possibly can you show me an example of this being used in regular condor format? That would be very helpful. Just a skeleton of what it would look like….Thanks

 

Josh Richardson

Integrity Applications Incorporated Intern (IAI)

703-378-8672 ext 632

 


From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Jones, Torrin A (US SSA)
Sent: Thursday, August 02, 2007 5:44 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] anyone with experience with matlab

 

Yes, I was looking at the first set of code.  In order to use condor_wait, you will first call the condor_submit with a submit file.  The submit files should have a line that says log = [somelogfile].  Replace [somelogfile] with the actually path to the log file.  After the submit is done, then you call condor_wait with the full path to the log file.  When the job is done, condor_wait will finish.

 

As for where this would go in you code.  I don't know.  I don't know matlab that well.

 

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Richardson, Joshua
Sent: Thursday, August 02, 2007 14:22
To: Condor-Users Mail List
Subject: Re: [Condor-users] anyone with experience with matlab

Are you referring to the first set of code? Would I just enter condor_wait in the submit function? Where exactly and how would that be used?

 

Josh Richardson

Integrity Applications Incorporated Intern (IAI)

703-378-8672 ext 632

 


From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Jones, Torrin A (US SSA)
Sent: Thursday, August 02, 2007 5:16 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] anyone with experience with matlab

 

Sorry, no experience with matlab, but let me ask a question anyway.

 

In your waitForState function is it actually waiting for the job to finish?  Or is it waiting for the 4 condor_submit commands that you run to finish?  I ask because you need to use the condor_wait command to wait for a job to finish and I'm not sure if that's what's being done here.

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Richardson, Joshua
Sent: Thursday, August 02, 2007 13:32
To: Condor-Users Mail List
Subject: [Condor-users] anyone with experience with matlab

I have been trying to set up condor with matlab for quite a while. I have simplified my matlab code that I am trying to run and am still having difficulty get the results. It seems as if Condor is returning results before matlab has a chance to finish completing the tasks. I am using the matlab distributed computing engine but am just asking for help from anyone that is familiar with matlab. I am going to paste my code below and was wondering if someone could take a look at it and see if there seems to be any glaring mistakes that my team and I can not figure out.

 

First, I have a m file named trial.m. this is the code that I want to run and get results from.

 

function trial()

 

jm = findResource('scheduler','configuration', 'generic');

set(jm,'configuration','generic');

job1 = createJob(jm);

createTask(job1, @sum, 1, {[1 1]});

createTask(job1, @sum, 1, {[2 2]});

createTask(job1, @sum, 1, {[3 3]});

createTask(job1, @sum, 1, {[4 4]});

submit(job1);

waitForState(job1, 'finished', 60)

results = getAllOutputArguments(job1)

 

 

Next, is my submit function through matlab

 

function submitfcn(scheduler, job, props, extraCondorSubmitArgs) %#ok Not using job

%SUBMITFCN Submit a Matlab job to a Condor scheduler

%

% See also workerDecodeFunc.

%

% Assign the relevant values to environment variables, starting

% with identifying the decode function to be run by the worker:

decodeFcn = 'workerDecodeFunc';

if nargin < 4

    extraCondorSubmitArgs = '';

end

% Ask the workers to print debug messages by default by setting MDCE_DEBUG to

% true.

jobEnvVars = {'MDCE_DECODE_FUNCTION',decodeFcn, ...

              ...%' MDCE_STORAGE_LOCATION',props.StorageLocation, ...

              ' MDCE_STORAGE_CONSTRUCTOR',props.StorageConstructor, ...

              ' MDCE_JOB_LOCATION',props.JobLocation, ...

              ' MDCE_DEBUG','true'};

taskEnvVars = cell(1, numel(props.TaskLocations));

for i = 1:numel(props.TaskLocations)

    taskEnvVars{i} = {'MDCE_TASK_LOCATION', props.TaskLocations{i}};

end

if isempty(scheduler.ClusterMatlabRoot)

    warning('distcomp:condor:NoClusterMatlabRoot', ...

            ['The scheduler''s ClusterMatlabRoot property is empty.\n', ...

             'Using  matlabroot  instead.']);

    clusterMatlabRoot = matlabroot;

else

    clusterMatlabRoot = scheduler.ClusterMatlabRoot;

end

matlabScript = fullfile(clusterMatlabRoot, 'bin', 'matlab');

% ... Do we need the following ??? ...

if ispc

    matlabScript = [matlabScript, '.bat'];

end

matlabArgs = strrep(scheduler.matlabCommandToRun, 'matlab ', '');

 

% Determine where to save the standard output, standard error and the

% Condor log.

logFiles = cell(1, props.NumberOfTasks);

outFiles = cell(1, props.NumberOfTasks);

errFiles = cell(1, props.NumberOfTasks);

for i = 1:props.NumberOfTasks

    taskLoc = fullfile(scheduler.DataLocation, props.TaskLocations{i});

    logFiles{i} = [taskLoc, '.log'];

    outFiles{i} = [taskLoc, '.out'];

    errFiles{i} = [taskLoc, '.err'];

end

 

% Create one condor submit file for all the tasks.

script = createCondorSubmitScript(matlabScript, matlabArgs, ...

                                  jobEnvVars, taskEnvVars, ...

                                  errFiles, outFiles, logFiles);

% Submit a Condor job that executes all the tasks:

...%[pathstr, name, ext, versn] = fileparts(script);

 ...%   script2 = name;

    % Execute the submit command on the remote host.

    %copyfile(script, '.')

condorSubmitCommand = ['condor_submit ', script, ' ', extraCondorSubmitArgs];

[s, w] = system(condorSubmitCommand);

% Leave behind the necessary debugging information if the submission failed.

if s ~= 0

    warning('distcomp:condor:SubmitFailed', ...

            ['Call to condor_submit failed with the following message:\n\n', ...

             '    %s\n\n', ...

             'The submit command used was:\n\n    %s\n\n', ...

             'Not deleting the submission file %s.'], ...

             w, condorSubmitCommand, script2);

else

    % Display the Condor job number:

    disp(w);

    % Clean up:

    delete(script);

   % delete(script2);

end

 

function filename = createCondorSubmitScript(matlabScript, matlabArgs, jobEnvVars, taskEnvVars, errFiles, outFiles, logFiles)

%Create a Condor submit script that forwards the correct environment variables

%and executes Matlab.

 

% We assume that the decode function has been put on the path of the MATLAB

% workers, e.g. by putting it into $MATLABROOT/toolbox/local.

 

% Double all backslashes so fprintf prints out a single backslash.

matlabScript = strrep(matlabScript, '\', '\\');

matlabArgs = strrep(matlabArgs, '\', '\\');

jobEnvVars = strrep(jobEnvVars, '\', '\\');

for i = 1:numel(taskEnvVars)

    taskEnvVars{i} = strrep(taskEnvVars{i}, '\', '\\');

end

outFiles = strrep(outFiles, '\', '\\');

errFiles = strrep(errFiles, '\', '\\');

logFiles = strrep(logFiles, '\', '\\');

 

condorHeader = [ 'Universe            = vanilla\n', ...

                 'Executable          = condor_exec.bat \n',... %s\n', ...

                 'Transfer_Executable = true\n', ...

                 'Requirements        = (machine == "condor01.integrity-apps.com")\n'

                 ];

taskString   = ['Arguments            = %s\n', ...

                'Environment          = "%s"\n', ...

                ...%'input                = matlab_metadata.mat \n', ...

                'Error                = %s\n', ...

                'Output               = %s\n', ...

                'Log                  = %s\n', ...

                'should_transfer_files = YES \n',...

                'transfer_input_files = matlab_metadata.mat, job1.in.mat, job1.common.mat, job1.state.mat, job1.out.mat \n', ...

                'when_to_transfer_output = ON_EXIT \n',...

                'notify_user          = jrichardson@xxxxxxxxxxxxxxxxxx \n',...

                'Queue\n\n'];

filename = tempname;

fid = fopen(filename, 'wt');

fprintf(fid, condorHeader, matlabScript);

 

for i = 1:numel(taskEnvVars)

    % Create a cell-array of all the environment variables we want to set

    % for the current task, and transform it into a string for the Condor

    % script.

    envString = createCondorEnvString({jobEnvVars{:}, taskEnvVars{i}{:}});

    % Append a clause to the Condor script to queue the current task.

    fprintf(fid, taskString, matlabArgs, envString, errFiles{i}, outFiles{i}, logFiles{i});

end

fclose(fid);

 

function envString = createCondorEnvString(envVars)

%envStr = createCondorEnvString(envVars)

%  envVars should be a cell arra of even length.  The even entries are

%  the environment variables, the odd entries are their values.

 

% In Condor, environment variables are specified in UNIX as

%  Environment = var1=val1;var2=val2;...varn=valn

% and on Windows, the separator is '|' instead of ';', i.e. the format is

%  Environment = var1=val1|var2=val2|...varn=valn

 

if ispc

    envSep = '  ';

else

    envSep = ';';

end

envString = '';

for i = 1:2:numel(envVars)

    envString = [envString, envVars{i}, '=', envVars{i + 1}, envSep];

end

 

 

 

Now is my executable that is being passed through condor. There might be a problem here. It calls worker.bat that starts matlab.bat which starts matlab. My argument is worker.bat

 

@echo off

 

IF EXIST "C:\MATLAB_DISTRIBUTED\R2007a\bin\" (SET DRIVELETTER=C)

SET DL=%DRIVELETTER%:\MATLAB_DISTRIBUTED\R2007a\bin\

 

IF EXIST "G:\MATLAB_DISTRIBUTED\R2007a\bin\" (SET DRIVELETTER=G)

SET DL=%DRIVELETTER%:\MATLAB_DISTRIBUTED\R2007a\bin\

 

echo %DL%

 

echo %DRIVELETTER% "is the DRIVELETTER "

 

SET TEMP=%_CONDOR_SCRATCH_DIR%

SET

%DL%%1

 

SET

 

 

 

Any suggestions will be greatly appreciated. Currently, I receive a message saying that the job is completed while tasks are still running and I get a result of an empty 4 x 0 array

Josh Richardson

Integrity Applications Incorporated Intern (IAI)

703-378-8672 ext 632

 

 

Attachment: example.job
Description: example.job