[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Running "batch" jobs on different platforms



Title: Message
All,
 
OK, I've decided that as usual I am asking the wrong questions.  Let me tell you what I want to do and you guys tell me how to do it...
 
I want to be able to run pre-built "batch" jobs on different machines in a Condor pool.  By "pre-built", I mean that I will have a file of commands that perform a specific task on a machine that resides on that machine.  If I log onto the machine directly and execute the file as a shell procedure on Red Hat--or a batch file on Windows--the commands are executed and the task is completed.
 
In a Windows-only environment we have a set of batch files sitting on the box which we can fire-off.  The problem is that now we need to integrate a Red Hat (and eventually a Sun) box into this environment so that we issue a command and the Windows box runs its batch files, the Red Hat box runs its set of commands and the Sun box runs its set of commands.  It appeared to me that Condor (with the possible future addition of DAGman) could perform the control function of this process.
 
I thought I could build/debug/test each the system's independent "batch" procedures, build a Condor submit file for each procedure which was configured in such a way so that each submit job would only run on its appropriate system and run the local "batch" file and wait for all jobs to complete.
 
Is this doable?  The Windows portion works fine; I have a small batch file defined for 2 Windows boxes in the pool; I built a submit file to direct each system to run one copy and it all works.  But what about Unixes?  I though if I built a small shell script and submitted it to run on the Red Hat box in the pool it would work.  But it doesn't, I get the log file you see below.  I'll include all the appropriate files:
 
====================================================
The shell script to be executed on Red Hat:
 
#!/bin/csh
echo "Howdy!"
echo "Here is the output from 'hostname':"
hostname -v -i
 
echo ""
echo "Output from 'ls' command:"
ls -la
 
echo ""
echo "Output from a 'ping' command:"
ping -c 4 stargate.nuview.com
 
echo "That's all folks!"
 
=====================================================
The submit file for Red Hat:
 
universe       = vanilla

requirements   = OpSys == "LINUX"
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
 
executable     = linux.bat
output         = linux.out
error          = linux.err
log            = linux.log
queue
======================================================
 
The resultant log file:
 
001 (012.000.000) 08/16 10:24:17 Job executing on host: <192.168.1.222:1620>
...
007 (012.000.000) 08/16 10:24:17 Shadow exception!
            Error from starter on Mike_RH.nuview.com: Failed to execute '/opt/condor-6.6.6/home/execute/dir_20253/condor_exec.exe
                   condor_exec.exe': No such file or directory
             0  -  Run Bytes Sent By Job
             246  -  Run Bytes Received By Job
...
 
======================================================
 
So am I just going at this all wrong?  Am I using Condor in a way that it was not intended?  Are there other software solutions I should consider other than Condor?  Any help appreciated!
 
 
--
Mike Frederick
mike@xxxxxxxxxx