[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor - Slurm integration



I am just looking at this again now.

"Queue" is a reserved word in the condor submit language so it can't possibly be used to also specify the remote queue, can it? (I got an error when I tried).

I am now using htcondor 8.9.5 and the newest bosco/blahp on the remote end (bosco 1.3.0).

I tried all 5 of the parameters Carl has got here and none of them made it
through into the slurm job that got submitted.. I am still investigating as to why that was.

Brian also pointed out that in 8.9 and the newer versions of htcondor-ce there is a variable substitution feature via the
set_default_remotece_requirements.

https://htcondor-ce.readthedocs.io/en/latest/batch-system-integration/#setting-batch-system-directives

Below is what our slurm_local_submit_attributes.sh looks like at NERSC right now. All of those attributes can and do sometimes change.


echo "#SBATCH --account=m2612"
#echo "#SBATCH --reservation=xrootd_debug"
echo "#SBATCH -N 1"
echo "#SBATCH -q regular"
echo "#SBATCH -C knl,cache,quad"
echo "#SBATCH --image=cmssw/cms:rhel7"
echo "#SBATCH -L cscratch1,cvmfs"
echo "#SBATCH --module=cvmfs"
echo "#SBATCH --volume=\"/global/cscratch1/sd/uscms/node_cache:/tmp:perNodeCache=size=680G\""
echo "#SBATCH -t 48:00:00"


So do I understand correctly that if
I modified my script to be

echo "#SBATCH --image=$Container"

and then modified my condor submit file to have

 set_default_remote_cerequirements = strcat(Container == cmssw/cms:rhel7)

that the Container variable would be substituted in at submit time?

If not, then how does it work?


Steve Timm






On Mon, 30 Sep 2019, Carl Edquist wrote:

Hi Asvija,

Brian asked me to look into this - sorry for the delay getting back to you.

The mappings I find based on the condor 8.8.4 version of slurm_submit.sh are:

        "BatchProject" ->
        #SBATCH -A $bls_opt_project

        "BatchRuntime" ->
        #SBATCH -t $((bls_opt_runtime / 60))

        "RequestMemory" ->
        #SBATCH --mem=${bls_opt_req_mem}

        "Queue" ->
        #SBATCH -p $bls_opt_queue

        "NodeNumber" ->
        #SBATCH -N $bls_opt_mpinodes

Carl

On Thu, 5 Sep 2019, Asvija B wrote:

Hi Brian,

Condor version is 8.8.4


Thanks and regards,

Asvija

On 9/5/2019 2:33 AM, Brian Lin wrote:
Hi Asvija,

Unfortunately, there isn't much in terms for documentation but I could
give you a mapping if you give me the version of HTCondor you're running.

Thanks,
Brian

On 8/19/19 12:12 AM, Asvija B wrote:
Thanks a lot Brian... I am able to see the +remote_NodeNumber getting
translated properly.

Can you also please indicate the corresponding directives for other
SLURM related attributes as well (like --nodes, ntasks etc.)

It would be great if you can point me to some documentation related to
this info..

Additionally, the slurm_submit.sh file from BLAH's github directory (
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_prelz_BLAH_blob_master_src_scripts_slurm-5Fsubmit.sh&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=uSoCpZIHSkJbWZvxQFc38hmbXxpxB11Zcgi6nOZorLs&e= ) has additional capabilities of GPU support and MIC support.  Do we
have any documentation which points to the corresponding Condor
directives for these ?

Thanks again for the information.

Regards,

Asvija


On 8/16/2019 8:53 PM, Brian Lin wrote:
Hi Asvjia,

You'll want to specify '+remote_NodeNumber' in your original grid job
submit file. However, you should note that the Slurm directives we set
will be changing in future releases of HTCondor 8.9 to the following:

"#SBATCH --nodes=1"
"#SBATCH --ntasks=1"
"#SBATCH --cpus-per-task=$bls_opt_mpinodes"

- Brian

On 8/13/19 12:32 AM, Asvija B wrote:
Dear Condor users,

We are planning to use HT-Condor for submitting jobs to some of our
SLURM managed clusters.  As I digged into the documentation, I
understood that HT-Condor uses BLAH GAHP for supporting job submission
to SLURM.

We are interested in submitting MPI jobs to SLURM  through HT-Condor.
In this regard, I am unable to look at the configuration parameters in
the condor submission script for indicating MPI related information
(for eg. number of nodes etc.)

I have seen the script file
$CONDOR_HOME/libexec/glite/bin/slurm_submit.sh .  It does include
statements with   $bls_opt_mpinodes  which translate to "SBATCH -N "
directives.   However I am not clear about the equivalent condor
directives that will result in the proper SLURM directives. Hence it
would be great if any of the SLURM users can comment on this.


Thanks and regards,

Asvija B



------------------------------------------------------------------------------------------------------------


[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_CDACINDIA&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=uvVH3LcThEuGbesE0n2o3_BwAhhAFvrhFuoGZIVbviw&e=  & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and
destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this
email
is strictly prohibited and appropriate legal action will be taken.

------------------------------------------------------------------------------------------------------------



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_mailman_listinfo_htcondor-2Dusers&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=WBQKEaMHUAFVqImfbLGU1P8F_wjAZQRDNkKVZSRfaVU&e=
The archives can be found at:
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_archive_htcondor-2Dusers_&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=sMGjIfjYSKnCI3pGrWIMpuctjLWtvfAv5yg6eFUthJ0&e=


------------------------------------------------------------------------------------------------------------

[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_CDACINDIA&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=uvVH3LcThEuGbesE0n2o3_BwAhhAFvrhFuoGZIVbviw&e=  & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.

------------------------------------------------------------------------------------------------------------





------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_CDACINDIA&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=uvVH3LcThEuGbesE0n2o3_BwAhhAFvrhFuoGZIVbviw&e=  & Twitter: @cdacindia ]

This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_mailman_listinfo_htcondor-2Dusers&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=WBQKEaMHUAFVqImfbLGU1P8F_wjAZQRDNkKVZSRfaVU&e=
The archives can be found at:
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_archive_htcondor-2Dusers_&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=sMGjIfjYSKnCI3pGrWIMpuctjLWtvfAv5yg6eFUthJ0&e=
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_mailman_listinfo_htcondor-2Dusers&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=WBQKEaMHUAFVqImfbLGU1P8F_wjAZQRDNkKVZSRfaVU&e=
The archives can be found at:
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_archive_htcondor-2Dusers_&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=sMGjIfjYSKnCI3pGrWIMpuctjLWtvfAv5yg6eFUthJ0&e=

------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Office: Feynman Computing Center 243
Fermilab Scientific Computing Division,
Scientific Computing Facilities Quadrant.,
Experimental Computing Facilities Dept.,
Grid and Cloud Operations Group