[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] LAM/MPI and the lamscript



Dear Mark,
Do you have any script for mp1script(for MPICH1 jobs)?

On 4/30/07, Mark Calleja <M.Calleja@xxxxxxxxxxxxxxx> wrote:
Hi Sara,

I'm not sure if you still need it, but I've put my lamscript at:

http://www.escience.cam.ac.uk/~mcal00/condor/lamscript.asc

It works with /bin/sh or /bin/bash, and I've tested it with LAM v.7.1.3
(you'd want the 7.1.x branch to get the SMP additions), so copy it and
rename it as lamscript. Note that you'll need to set your own values for
LAMDIR and LD_LIBRARY_PATH (the latter for compiler libs). Also look at
the comment half way down the script that begins "Each of my machines
has...". From your email it would seem that you just need to set "cpu=2".

Let me know how you get on.

Cheers,
Mark

Sara Campos wrote:
> Hello,
>
> I've posted before to the mailing list but I didn't receive any
> answer.   My main doubt  was  about  how to use  the  lamscript.  Every
> time  I  tried  to  use  it the job was idle and condor _q  -analyze
> showed   "6 match but reject the job for unknown reasons" (I am testing
> with 3 computers, each with 2 processors).  The submit script was
> something like this:
>
> Executable      = lamscript
> Universe = parallel
> machine_count = 2
> arguments = test_2.sh
> output = run.out
> error = run.error
> log = run.log
> +WantParallelSchedulingGroups = True
> should_transfer_files = yes
> when_to_transfer_output = on_exit
> transfer_input_files = test_2.sh
> queue
>
> And in the local config files I have something like this:
>
> ParallelSchedulingGroup     = "$(HOSTNAME)"
> DedicatedScheduler          = "DedicatedScheduler@$(FULL_HOSTNAME)"
> Startd_EXPRS                = $(STARTD_EXPRS), DedicatedScheduler,
> ParallelSchedulingGroup
> RANK                        = Scheduler =?= $(DedicatedScheduler)
>
>
> I was able of running test_2.sh in parallel outside Condor so the
> executable works and also the lam and mpi are working in the machines.
>
> In the lamscript  I changed the  LAMDIR  and adapted the lamboot
> command. I didn't add the LAMDIR to the .cshrc file as it is suggested
> in the script because I don't have a .cshrc file (and sincerely I didn't
> understand why it was necessary to do that). I don't know  if I should
> have changed the script in other places,  if I am doing something else
> wrong that has nothing to do with the lamscript or if  the problem is
> related to the .cshrc file.
>
> I hope someone can help me with this ... I didn't find much information
> in the archives.
>
> Thanks in advance
>
> Sara
>
> PS: Bellow you can see my previous message which has some doubts mostly
> concerned with this problem.
>
> Hello,
>
> We are thinking to use Condor to manage a pool of dedicated
> multiprocessor machines. One of our goals is to be able of running
> parallel jobs using LAM/MPI and running the job on a single machine
> (using the different processors). We have been doing some tests with
> only a few machines but some doubts have appeared.
>
> 1. We tried to use the lamscript script provided but it didn't work out
> probably because the user's login shell is bash. Is it necessary to have
> csh as a login shell in order to run the lamscript? If so, how can we
> overcome that since all users in our pool use bash? If I am confused
> what is exactly meant by this paragraph taken from the manual "For LAM,
> there is a similar path setting, but it is called LAMDIR in the
> lamscript script. In addition, this path must be part of the path set in
> the user's .cshrc script. As of this writing, the LAM implementation
> does not work if the user's login shell is the Bourne or compatible shell."?
>
> 2. Is it imperative to define a dedicated scheduler in order to run
> parallel jobs or is this only optional? If so what are the advantages?
> What happens for instance when the submission script defines a scheduler
> but is submitted from a different machine (that not the dedicated
> scheduler)? Finally, how does the central manager orders the jobs from
> the different submit machines' queues and is this related with the
> convenience of defining a dedicated scheduler?
>
> I hope I haven't made too many boring questions... Thanks in advance.
>
> Sara Campos
>
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
>


--
Dr Mark Calleja
Cambridge eScience Centre, University of Cambridge
Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WA
Tel. (+44/0) 1223 765317, Fax  (+44/0) 1223 765900
http://www.escience.cam.ac.uk/~mcal00

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR



--
Best Regards,
S.Mehdi Sheikhalishahi,
Web: http://www.cse.shirazu.ac.ir/~alishahi/
Bye.