[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] multicore and multinode run



Thanks for the reply. It is now seems to be clear.

However, I am looking for the correct syntax for specifying the following example case

 

* One machine allocating 4 cores

* Two machines each allocating 2 cores

* Four machines each allocating 1 cores

 

Also, assume at the moment that a user wants to submit his job, the free cores are listed as below:

* Machine1 with 1 cores

* Machine2 Âwith 2 cores


Then, if the user specifies machine_count=3, is condor wise enough to allocate 1 cores from machine1 and 1 cores from machine2?

 

Regards,

Mahmood

 

From: Jason Patton
Sent: Friday, January 26, 2018 6:16 PM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] multicore and multinode run

 

If you want to get output from multiple nodes of a parallel universe

job, you'll need to include the $(Node) macro as part of your

output/error file names. There are some examples in the manual:

http://research.cs.wisc.edu/htcondor/manual/current/2_9Parallel_Applications.html

 

I really recommend thoroughly reading that page in the manual, it

address a few use cases (e.g. making sure the entire job isn't taken

down by Node 0 exiting early, requesting multiple cpu cores) that may

be relevant for future jobs.

 

However, with Open MPI jobs, all non-error/debug output should be

directed to node 0, which is the only node on which mpirun is

executed. The output you sent looks good and matches your submit

file... machine_count = 2 so you only get output from two nodes (with

one cpu core each by default). Those nodes may be two slots on the

same machine, which seems to be what happened in your case (your job

landed on two slots within compute-0-1).

 

Jason