[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] how to use rank with memory



There may be other requirements that are dictating the choice of node07 aside from its memory. Do the larger-memory machines also have at least one GPU card with CUDACapability of 3.5 or higher? This will show you:

 

condor_status -constraint âCUDACapability>3.0â

 

Also, if you submit the job on hold with a âhold=trueâ statement in the submit or on the condor_submit command line, then you can use condor_q -better-analyze to take a look at what machines the job is able to run on, or use -reverse -machine node07 (or others) to see why or why not one of them is or is not matching to the job.

 

Michael V Pelletier

Principal Engineer

Raytheon Technologies

Digital Technology

HPC Support Team

 

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Myunggi Yi
Sent: Thursday, June 24, 2021 9:42 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [External] [HTCondor-users] how to use rank with memory

 

Dear users,

 

I have installed HTCondor 9.0.1

I want the jobs to run on more memory machines.

 

I submitted a job with the following script, but the job always goes to node07, which has less memory.

How can I achieve my goal?

 

Thank you for your help.

 

==========================

Executable            = train.sh
Log                   = $outfile.log
Error                 = $outfile.err
Output                = $outfile.out
# NFS
+IwdFlusNFSCache      = False
Should_transfer_files = no
GetEnv                = True
# for ML jobs
Request_GPUs          = 1
Requirements          = CUDACapability > 3.0
Rank                  = memory
# Prevent re-run
periodic_remove       = JobStatus == 1 && NumJobStarts > 0
# Email
Notification          = Always
Notify_user           = myunggi@xxxxxxxxxxxxx
Queue

=================================

 

 

The following is my condor_status

 

node01.synapse     LINUX      X86_64 Unclaimed Idle      0.000 80222  0+22:49:4
node02.synapse     LINUX      X86_64 Unclaimed Idle      0.000 64094  0+22:49:4
node03.synapse     LINUX      X86_64 Unclaimed Idle      0.000 64094  0+22:49:4
node04.synapse     LINUX      X86_64 Unclaimed Idle      0.000 64094  0+19:18:0
node05.synapse     LINUX      X86_64 Unclaimed Idle      0.000 64094  0+22:49:4
node07.synapse     LINUX      X86_64 Claimed   Busy      0.280 31892  0+00:00:3
node08.synapse     LINUX      X86_64 Unclaimed Idle      0.000 31892  0+22:49:3
node09.synapse     LINUX      X86_64 Unclaimed Idle      0.000 31892  0+22:49:3
node10.synapse     LINUX      X86_64 Unclaimed Idle      0.000 31892  0+22:49:2
node11.synapse     LINUX      X86_64 Unclaimed Idle      0.000 15779  0+15:34:3
node12.synapse     LINUX      X86_64 Unclaimed Idle      0.000 64042  0+22:49:3
node13.synapse     LINUX      X86_64 Unclaimed Idle      0.000 15819  0+22:44:3

 

Best regards,