[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor-users Digest, Vol 96, Issue 2



David,

thank you very much. It works now!

Josef

On Mon, Nov 1, 2021 at 10:15 PM <htcondor-users-request@xxxxxxxxxxx> wrote:
Send HTCondor-users mailing list submissions to
    htcondor-users@xxxxxxxxxxx

To subscribe or unsubscribe via the World Wide Web, visit
    https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
or, via email, send a message with subject or body 'help' to
    htcondor-users-request@xxxxxxxxxxx

You can reach the person managing the list at
    htcondor-users-owner@xxxxxxxxxxx

When replying, please edit your Subject line so it is more specific
than "Re: Contents of HTCondor-users digest..."


Today's Topics:

 Â1. Re: Docker universe + GPU (CUDA)+pytorch
   (duduhandelman@xxxxxxxxxxx)
 Â2. Re: could not make temporary directory:
   `/var/lib/condor/execute/dir_ Re: Fw: Xvfb | Docker Universe |
   Can't open display | (but works with docker run) (gthain@xxxxxxxxxxx)


----------------------------------------------------------------------

Message: 1
Date: Mon, 01 Nov 2021 17:57:09 +0000
From: duduhandelman@xxxxxxxxxxx
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Docker universe + GPU (CUDA)+pytorch
Message-ID:
    <VI1PR10MB32478B128515B48DDBFB0C3FA78A9@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>

Content-Type: text/plain; charset="us-ascii"

Hi Joseph.
You should add it to DOCKER_EXTRA_ARGUMENTS on the executer machine.
If i recall correctly you should install some nvidia docker extension to have thia feature.
One last thing use nvidia docker image as a base docker image. There is some environment variables allready in the image

I haven't done this for a long time but i have this working at my cluster.
So it should be ok.

David


Get Outlook for Android<https://aka.ms/AAb9ysg>

________________________________
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Fulem Josef <fulemj@xxxxxxxxxx>
Sent: Monday, November 1, 2021, 18:37
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] Docker universe + GPU (CUDA)+pytorch

Hello,

Currently I'm trying to use the docker container with htcondor docker universe to run an application which requires the usage of GPU (CUDA) - pytorch.

When I do it via vanilla universe it works OK and the CUDA is available.

When I run this command:
condor_status -constraint '!isUndefined(DetectedGPUs)' -compact -af CUDADeviceName DetectedGPUs

then this is the output:
GeForce RTX 2070 SUPER GPU-d4decf4f, GPU-2a518ecd

Also, I have this in my htcondor config:
use feature : GPUs
GPU_DISCOVERY_EXTRA = -extra

So it looks like the condor_gpu_discovery works OK.

When I build my docker image and I run it with --gpus all or --gpus device=0
the CUDA is available and the application running in the container can use it.

But when I run it (the same docker image) via htcondor by using docker universe the GPUs are not accessible even though the GPU is requested.

It looks like the docker run is missing the --gpus flag. Is it possible to pass this to the docker somehow?

Thank you very much for any suggestion or help.

Best Regards.

Josef




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www-auth.cs.wisc.edu/lists/htcondor-users/attachments/20211101/4444d1ab/attachment.html>

------------------------------

Message: 2
Date: Mon, 01 Nov 2021 16:13:24 -0500
From: gthain@xxxxxxxxxxx
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] could not make temporary directory:
    `/var/lib/condor/execute/dir_ Re: Fw: Xvfb | Docker Universe | Can't
    open display | (but works with docker run)
Message-ID: <5862f3ef-3764-510d-8d3a-9b882c5ea5b3@xxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"; Format="flowed"


Hello:

A couple of things.? The Xvfb package is very useful to run programs
that must start a gui on machines without screens.? This package usually
comes with a very helpful starter program which takes care of starting
the (virtual) graphics server, waiting until it is ready, setting the
appropriate environment variables, running the actual user program, and
shutting everything down at the end.? This is named

xvfb-run

and you can start your user problem under xvfb by simply running

xvfb-run my_gui_program some arguments

This may be easier and more reliable than setting DISPLAY manually.

Now, as far as the temporary directory problem, there's a pretty glaring
bug in the Xmgrace.pm module when the environment variable TMPDIR is
set, which condor does set.

Try adding the following to your startup script before perl runs:

unset TMPDIR

-greg

On 10/29/21 6:02 PM, htcondor-users@xxxxxxxxxxx wrote:
> Hello again,
>
> The error message: *could not make temporary directory*:
> `/var/lib/condor/execute/dir_33779/Graph_Xmgrace_23' etc
>
> only appears on HTcondor when the perl script contains more than one
> call to xmgrace.
> Adding permission changes before the perl script is called:
>
> chmod g+w $TEMP
> chmod g+w $TMPDIR
> chmod a+rw ${PWD}
> Assuming that ${PWD} is in fact the temporary
> /var/lib/condor/execute/dir_*
>
> has no effect on the error even though it does change the permissions:
>
> drwxrw-rw-. 2 390870428 390800513 ?202 Oct 29 22:48 .
>
> But still the error persists.
>
> It is rather strange... and the error does not occur on a the Mac when
> using |docker run|?
>
> Any idea?
> JYS
> ------------------------------------------------------------------------
> *From:* HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf
> of htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
> *Sent:* Friday, October 29, 2021 3:53 PM
> *To:* htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
> *Cc:* JEAN-YVES SGRO <jsgro@xxxxxxxx>
> *Subject:* Re: [HTCondor-users] Fw: Xvfb | Docker Universe | Can't
> open display | (but works with docker run)
> Hello Dima,
>
> Thank you very?much for taking?the time to answer so promptly.
> I am just an "end user" trying?to make things work. (i tried many
> things before replying...)
>
> I found the Display problem:? the following line is necessary to have
> Xvfb active (on DISPLAY named :99) and can be added to the script or
> source from /etc/bashrc
> *Xvfb :99 -ac &*
>
> However, I encounter another problem reported in the error output:
>
>Â Â Â*could not make temporary directory*:
>Â Â Â`/var/lib/condor/execute/dir_33779/Graph_Xmgrace_23' at
>Â Â Â/usr/local/share/perl5/Chart/Graph/Xmgrace.pm line 136.
>
>
> The directory name varies. This time it ends with |23|? but another
> time it ended with |21|? so it's not predictable.
>
> Lines 135 and 136 from the Xmgrace.pm file say:
> ? ?135 ? ?# create tmpdir
> ? ?136 ? ?_make_tmpdir("_Xmgrace_"); # grace files should be saved for
> user tweaking
>
> Is there a way to fix this permission issue?
>
> I don't have these issues when running on the Mac: I "pushed" the
> image onto the hub, and it is available to test. I included the simple
> |test.pl|? file into the |/home|? directory on the image. In that case
> there is no need for |-v|? to test:
>
> This is what I can do on my Mac from a Terminal:
>
> docker run -it --rm -w /home ?jysgro/xmgrace-c7
>
> This creates a new container, with interactive Terminal, which will be
> removed when using |exit|? and goes directly?into the |/home|?
> directory. There is a single file called |test.pl|? which can then be
> executed within the container with: |perl test.pl|?
> When this is done, the |/home|? directory will contain 3 files:
> test.pl ?xmgrace1.agr ?xmgrace1.png
> and that means that the program worked fine.
>
> If there is a temporary directory to create it is not a problem there
> but it is on HTCondor.
> Is there a permission to be given so that temporary directories can be
> created ?
>
> THank you!
>
> Jean-Yves
>
> ------------------------------------------------------------------------
> *From:* HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf
> of dmitri.maziuk@xxxxxxxxx <dmitri.maziuk@xxxxxxxxx>
> *Sent:* Friday, October 29, 2021 12:52 PM
> *To:* htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
> *Subject:* Re: [HTCondor-users] Fw: Xvfb | Docker Universe | Can't
> open display | (but works with docker run)
> On 2021-10-29 12:07 PM, JEAN-YVES SGRO via HTCondor-users wrote:
>
> > Problem1: Docker Universe? does not provide the same environment as
> docker run? within the container.
> > Problem2: Xvfb works fine within a docker run? container but does
> not work when running as a Docker Universe? job.
> > Question: is there a way to make Xvbf? work on HTcondor?
>
> How does it "work fine" outside of condor?
>
> Access to host XVfb is via a socket in /tmp/.X11-unix, are you mounting
> it with `-v` for the non-condor runs? Or are you firing up Xvfb inside
> the container?
>
> For the former, my guess is you should be able to mount the socket as
> per
> https://htcondor.readthedocs.io/en/latest/admin-manual/setting-up-vm-docker-universes.html#the-docker-universe
> <https://htcondor.readthedocs.io/en/latest/admin-manual/setting-up-vm-docker-universes.html#the-docker-universe>
>
>
>
> Dima
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> <https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> <https://lists.cs.wisc.edu/archive/htcondor-users/>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www-auth.cs.wisc.edu/lists/htcondor-users/attachments/20211101/1d323bab/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
HTCondor-users mailing list
HTCondor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

------------------------------

End of HTCondor-users Digest, Vol 96, Issue 2
*********************************************