[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] New to Condor, Need to RUN MPI



It seems this message was blocked because I changed the alias of my
email address. My apologies if I'm wrong and this is the second time the
message is send.
--------------------------------------------------------------------------

Dear mr Kanal,

I've been following your struggles with some interest, as I'm trying to
run MPI-jobs on Condor as well, except that I'm using openMPI.
Fortunately (for me) all the configuration stuff is done by a
systemadministrator at our institute, so I couldn't help you out. But as
I'm reading your problems, I'm wondering about the following:
It seems that you have a machinepool that is dedicated to run MPI-jobs.
If that is really the case, Condor is completely useless for you. In the
parallel Universe, _all_ that Condor does, is create a machinefile and
then call the mpirun-command with that (temporary) machinefile. If you
have a cluster dedicated to do MPI-computations, you can just as well
just run mpirun and the result will be exactly the same! (without all
the Condor-headaches)

Greetings, Jakob

Samir Khanal wrote:
> Hi Zach
> I tried and MPI worked too.
> Thank you so much,
> You made my day!
> Samir
> 
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Zachary Miller
> Sent: Tuesday, February 03, 2009 7:03 PM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] New to Condor, Need to RUN MPI
> 
>> I looked up the CollectorLog and found the following entries. Those ips are of the computenodes
>>
>> 2/3 17:28:27 DaemonCore: PERMISSION DENIED to unknown user from host <10.1.255.251:59011> for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD
> 
> this is the key problem.
> 
> you need to tell condor which IP addresses are allowed to do certain
> operations.  look in the condor_config file for the string "HOSTALLOW"
> and set them to something like this:
> 
> HOSTALLOW_READ = 10.1.*
> HOSTALLOW_WRITE = 10.1.*
> HOSTALLOW_ADMINISTRATOR = $(CONDOR_HOST)
> 
> this will allow machines in those networks to join your pool, run and/or submit
> jobs depending on which daemons they are running.
> 
> it also makes it so only the central manager can issue administrative commands.
> you may wish to make the central manager a different machine from your main
> submit point if you do not trust users on the submit machine not to say, turn
> off condor.
> 
> i hope that helps!
> 
> 
> cheers,
> -zach
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/