[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] setting up dedicated pool for parallel universe



On 12/26/18 10:09 AM, Kodanda Ram Mangipudi wrote:
Hi,

I am a newbie for htcondor set-up and administration though I was a user in the past.Â
We are trying to set up a pool of 2 machines each with dual CPUs with 20 cores each. I made an i5 6 core machine as the master, and the other 2 HPC workstations as the nodes.All are running Ubuntu 16.04. The condor installation is from the default Ubuntu repositories installed using apt-get.


First, note that you only need to setup a dedicated scheduler and submit MPI jobs with the parallel universe if you want to have MPI jobs ru n concurrently on more than one machine. If you just want your MPI jobs to run on multiple cores on one machine (which is always the fastest kind of interconnects), you can use the vanilla universe.




node01
File: /etc/condor/condor_confic (Package manager's copy; Identical to that of master node)ÂÂ
/etc/condor/config.d/00debconf (Edited for configuration)
-------------------------------------Begin file ----------------------------------ÂÂ

# Added: by system admin: For Dedicated scheduler for parallel universe
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxx"


On the nodes, you want the part after the @ sign to be the condor name of the schedd, which is probably not the ip address. You can find the name of a schedd by running


condor_status -sched

and the first column will be the condor name of the schedd. Put that after the @ sign in the config file, and restart, and I think things will work better.


-greg