[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] GPU Management



Imre Szeberenyi,

If you dont mind, can you also sent it to me? in my labs, we have 4
Tesla, but we still did not feel it's optimized yet...

thank you

On 3/12/13, Imre Szeberenyi <szebi@xxxxxxxxxx> wrote:
> Hi Owen,
>
> We have not so nice solution as Tim has:
> We have defined as many slots as many different use-case we expect:
> We have 12 CPU cores  and 2 Tesla cards.
> And we have defined start condition for all the supported use-cases.
> (Singe CPU, multi CPU, single GPU+GPU, single GPU, etc.)
> The config file is quite complex, but it works. I can send it you, if
> you are interested in. (I don't want to pollute the list with it.)
>
> Cheers,
>
> Imre
>
> On 2013.03.11. 15:04, Tim St Clair wrote:
>> Hi Owen -
>>
>> You may want to try our 'Machine Local Limits':
>> https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2905
>>
>> It's only in 7.9.0&>  .
>>
>> Cheers,
>> Tim
>>
>> ----- Original Message -----
>>> From: "Owen Hickey"<ohickey@xxxxxxxxxxxxxxxxxxxx>
>>> To: htcondor-users@xxxxxxxxxxx
>>> Sent: Monday, March 11, 2013 7:16:05 AM
>>> Subject: [HTCondor-users] GPU Management
>>>
>>> Dear Condor users and developers,
>>>
>>> we are a research group for computational physics in Stuttgart,
>>> Germany. We use condor to manage a lot of our computing resources.
>>> Recently we have added GPUs to most of our nodes and would like to
>>> include those as separate resources into Condor. We have tried the
>>> recipe prescribed on the internet, namely putting
>>>
>>>      SLOT1_HAS_GPU = TRUE
>>>      SLOT1_GPU_DEV=0
>>>      STARTD_ATTRS=HAS_GPU,GPU_DEV,GPU
>>>
>>>      RANK = (target.wantGPU =?= true)*10000000
>>>
>>> into the individual hosts configuration files. This does allow us to
>>> ask for machines having a GPU in the submit script.  The problem is
>>> that Condor launches as many jobs as there are CPU slots thus making
>>> the jobs run extremely slow.  What we would like to do is make it so
>>> that Condor tries to launch two GPU jobs per node.  We would also
>>> like
>>> to make it so that the user can request that theirs be the only GPU
>>> job on the node.
>>>
>>> Any help would be very much appreciated.
>>>
>>> Owen Hickey
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>>> with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with
>> a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>


-- 
Best regards,


Anton Siswo R.A.
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
Open Sources does not mean it's free like free as free
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*