[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] multicore and multinode run



I forgot to say that the example âRequesting multiple cores per slotâ in the document is good for the first case. But I doubt if it helps with my second case in the previous email.

 

Just want to be sure about that example in the manual.

 

Regards,

Mahmood

 

From: Jason Patton
Sent: Friday, January 26, 2018 6:16 PM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] multicore and multinode run

 

If you want to get output from multiple nodes of a parallel universe

job, you'll need to include the $(Node) macro as part of your

output/error file names. There are some examples in the manual:

http://research.cs.wisc.edu/htcondor/manual/current/2_9Parallel_Applications.html

 

I really recommend thoroughly reading that page in the manual, it

address a few use cases (e.g. making sure the entire job isn't taken

down by Node 0 exiting early, requesting multiple cpu cores) that may

be relevant for future jobs.

 

However, with Open MPI jobs, all non-error/debug output should be

directed to node 0, which is the only node on which mpirun is

executed. The output you sent looks good and matches your submit

file... machine_count = 2 so you only get output from two nodes (with

one cpu core each by default). Those nodes may be two slots on the

same machine, which seems to be what happened in your case (your job

landed on two slots within compute-0-1).

 

Jason