[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor - Slurm integration



Hi Steve,

Since this is for a CE, you'll want to use `set_remote_queue` or `eval_set_remote_queue` in your job router configuration. Carl's going to double-check that prefixing `remote_` is applicable to the other attributes in question.

As for the remote CE requirements, HTCondor-CE 4 with HTCondor 8.8 has a simpler format (https://htcondor-ce.readthedocs.io/en/latest/releases/#400), so you could set something like the following in your job route:

    set_Container = "cmssw/cms:rhel7";
    set_default_CERequirements = "Container";

And use the $Container variable in your slurm_local_submit_attributes.sh. It's not substituted at submit time per se but rather at the time that Bosco/BLAHP generates the submit file.

Reviewing your local submit attributes further, you can simplify some of those lines (this is assuming the need for the "remote_" prefix):

echo "#SBATCH --account=m2612"
--> 'set_remote_BatchProject = "m2612";' in your job route

echo "#SBATCH -N 1"
--> this is hardcoded so you can eliminate this line

echo "#SBATCH -t 48:00:00"
--> 'set_remote_BatchRuntime = 2880;' in your job route

Let us know if you have any additional questions!

- Brian

On 3/2/20 4:19 PM, Carl Edquist wrote:
Hi Steve,

I am now using htcondor 8.9.5 and the newest bosco/blahp on the remote end (bosco 1.3.0).

Ok, as far as I can tell the only significant addition to slurm_submit.sh between condor 8.8.4 and 8.9.5 was the ability to specify a job cluster, which translates to a line with "#SBATCH -M $cluster_name".  I don't see that any of the parameters have gone away though.


I tried all 5 of the parameters Carl has got here and none of them made it through into the slurm job that got submitted..

On the condor side, I think you may need to prefix those attribute names with "+remote_", if I understand correctly what I see in the manual here:

https://htcondor.readthedocs.io/en/stable/grid-computing/grid-universe.html#htcondor-c-job-submission


Brian also pointed out that in 8.9 and the newer versions of htcondor-ce there is a variable substitution feature via the set_default_remotece_requirements.

and then modified my condor submit file to have

set_default_remote_cerequirements = strcat(Container == cmssw/cms:rhel7)

So, a couple details that catch my attention are,

- you mention "set_default_remotece_requirements" -- maybe just a typo in the email; it's "remote_cerequirements" not "remotece_requirements"

- and, from my read of the "Setting batch system directives" section in the manual that you linked, "set_default_remote_cerequirements" goes in the "JOB_ROUTER_ENTRIES" configuration (defined in /etc/condor-ce/config.d/02-ce-*.conf and /etc/condor-ce/config.d/99-local.conf), but note that the attribute itself is called "default_remote_cerequirements" (without the "set_" prefix). So, i'm thinking putting "set_default_remote_cerequirements" in the submit file itself might not do the right thing.

Brian, can you confirm about whether set_default_remote_cerequirements or default_remote_cerequirements can be used in a submit file?

Thanks,
Carl

On Mon, 24 Feb 2020, Steven Timm wrote:

I am just looking at this again now.

"Queue" is a reserved word in the condor submit language so it can't possibly be used to also specify the remote queue, can it? (I got an error when I tried).

I am now using htcondor 8.9.5 and the newest bosco/blahp on the remote end (bosco 1.3.0).

I tried all 5 of the parameters Carl has got here and none of them made it through into the slurm job that got submitted.. I am still investigating as to why that was.

Brian also pointed out that in 8.9 and the newer versions of htcondor-ce there is a variable substitution feature via the
set_default_remotece_requirements.

https://htcondor-ce.readthedocs.io/en/latest/batch-system-integration/#setting-batch-system-directives

Below is what our slurm_local_submit_attributes.sh looks like at NERSC right now.  All of those attributes can and do sometimes change.


echo "#SBATCH --account=m2612"
#echo "#SBATCH --reservation=xrootd_debug"
echo "#SBATCH -N 1"
echo "#SBATCH -q regular"
echo "#SBATCH -C knl,cache,quad"
echo "#SBATCH --image=cmssw/cms:rhel7"
echo "#SBATCH -L cscratch1,cvmfs"
echo "#SBATCH --module=cvmfs"
echo "#SBATCH --volume=\"/global/cscratch1/sd/uscms/node_cache:/tmp:perNodeCache=size=680G\""
echo "#SBATCH -t 48:00:00"


So do I understand correctly that if
I modified my script to be

echo "#SBATCH --image=$Container"

and then modified my condor submit file to have

set_default_remote_cerequirements = strcat(Container == cmssw/cms:rhel7)

that the Container variable would be substituted in at submit time?

If not, then how does it work?


Steve Timm






On Mon, 30 Sep 2019, Carl Edquist wrote:

Hi Asvija,

Brian asked me to look into this - sorry for the delay getting back to you.

The mappings I find based on the condor 8.8.4 version of slurm_submit.sh are:

        "BatchProject" ->
        #SBATCH -A $bls_opt_project

        "BatchRuntime" ->
        #SBATCH -t $((bls_opt_runtime / 60))

        "RequestMemory" ->
        #SBATCH --mem=${bls_opt_req_mem}

        "Queue" ->
        #SBATCH -p $bls_opt_queue

        "NodeNumber" ->
        #SBATCH -N $bls_opt_mpinodes

Carl

On Thu, 5 Sep 2019, Asvija B wrote:

Hi Brian,

Condor version is 8.8.4


Thanks and regards,

Asvija

On 9/5/2019 2:33 AM, Brian Lin wrote:
Hi Asvija,

Unfortunately, there isn't much in terms for documentation but I could give you a mapping if you give me the version of HTCondor you're running.

Thanks,
Brian

On 8/19/19 12:12 AM, Asvija B wrote:
Thanks a lot Brian... I am able to see the +remote_NodeNumber getting
translated properly.

Can you also please indicate the corresponding directives for other
SLURM related attributes as well (like --nodes, ntasks etc.)

It would be great if you can point me to some documentation related to
this info..

Additionally, the slurm_submit.sh file from BLAH's github directory ( https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_prelz_BLAH_blob_master_src_scripts_slurm-5Fsubmit.sh&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=uSoCpZIHSkJbWZvxQFc38hmbXxpxB11Zcgi6nOZorLs&e= ) has additional capabilities of GPU support and MIC support.  Do we
have any documentation which points to the corresponding Condor
directives for these ?

Thanks again for the information.

Regards,

Asvija


On 8/16/2019 8:53 PM, Brian Lin wrote:
Hi Asvjia,

You'll want to specify '+remote_NodeNumber' in your original grid job submit file. However, you should note that the Slurm directives we set will be changing in future releases of HTCondor 8.9 to the following:

"#SBATCH --nodes=1"
"#SBATCH --ntasks=1"
"#SBATCH --cpus-per-task=$bls_opt_mpinodes"

- Brian

On 8/13/19 12:32 AM, Asvija B wrote:
Dear Condor users,

We are planning to use HT-Condor for submitting jobs to some of our
SLURM managed clusters.  As I digged into the documentation, I
understood that HT-Condor uses BLAH GAHP for supporting job submission
to SLURM.

We are interested in submitting MPI jobs to SLURM through HT-Condor. In this regard, I am unable to look at the configuration parameters in the condor submission script for indicating MPI related information
(for eg. number of nodes etc.)

I have seen the script file
$CONDOR_HOME/libexec/glite/bin/slurm_submit.sh .  It does include
statements with   $bls_opt_mpinodes  which translate to "SBATCH -N "
directives.   However I am not clear about the equivalent condor
directives that will result in the proper SLURM directives. Hence it
would be great if any of the SLURM users can comment on this.


Thanks and regards,

Asvija B




------------------------------------------------------------------------------------------------------------


[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_CDACINDIA&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=uvVH3LcThEuGbesE0n2o3_BwAhhAFvrhFuoGZIVbviw&e= & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and
destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this
email
is strictly prohibited and appropriate legal action will be taken.


------------------------------------------------------------------------------------------------------------



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_mailman_listinfo_htcondor-2Dusers&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=WBQKEaMHUAFVqImfbLGU1P8F_wjAZQRDNkKVZSRfaVU&e= The archives can be found at: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_archive_htcondor-2Dusers_&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=sMGjIfjYSKnCI3pGrWIMpuctjLWtvfAv5yg6eFUthJ0&e=




------------------------------------------------------------------------------------------------------------

[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_CDACINDIA&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=uvVH3LcThEuGbesE0n2o3_BwAhhAFvrhFuoGZIVbviw&e= & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.


------------------------------------------------------------------------------------------------------------






------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_CDACINDIA&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=uvVH3LcThEuGbesE0n2o3_BwAhhAFvrhFuoGZIVbviw&e= & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.

------------------------------------------------------------------------------------------------------------

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_mailman_listinfo_htcondor-2Dusers&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=WBQKEaMHUAFVqImfbLGU1P8F_wjAZQRDNkKVZSRfaVU&e= The archives can be found at: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_archive_htcondor-2Dusers_&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=sMGjIfjYSKnCI3pGrWIMpuctjLWtvfAv5yg6eFUthJ0&e=

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_mailman_listinfo_htcondor-2Dusers&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=WBQKEaMHUAFVqImfbLGU1P8F_wjAZQRDNkKVZSRfaVU&e= The archives can be found at: https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_archive_htcondor-2Dusers_&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=sMGjIfjYSKnCI3pGrWIMpuctjLWtvfAv5yg6eFUthJ0&e=

------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Office: Feynman Computing Center 243
Fermilab Scientific Computing Division,
Scientific Computing Facilities Quadrant.,
Experimental Computing Facilities Dept.,
Grid and Cloud Operations Group