[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] cron and specific slots



Sure, I can't easily post direct chunks of code, but here's the jist
(we also use Nvidia only, so this will be biased).  I also can't take
credit, i there's a sourceforge site (forget url) which setup the leg
work and then i worked with cyclecomputing to hammer out some details.

I wrote a cuda program that cycles through the CPU index and attempts
to open an Nvidia device with the same index number (starting with
Zero)

The result of that program output's these classads

GPU_DETECTED = TRUE
SLOT1_HAS_GPU = TRUE
SLOT1_GPU_NAME = "QUADRO FX 580"
SLOT1_GPU_CUDACAPABLE = TRUE
SLOT1_GPU_MEM = 536150016
SLOT1_GPU_PROCS = 4
SLOT1_GPU_CORES = 32
SLOT1_GPU_CLOCKRATE = 1.12
SLOT2_HASH_GPU = FALSE
SLOT3_HASH_GPU = FALSE
SLOT4_HASH_GPU = FALSE

The Slot2..4 are because I ran the program on a four CPU core machine,
but this one only had one GPU.  We have machines that have Tesla
S1070's, so there would be  one GPU assigned to each slot

I then add the below classads to the configuration of the machine with the GPU

GPU_DETECTED = TRUE

HAS_GPU = GPU_DETECTED && (((SLOT1_HAS_GPU == TRUE) && (SlotID == 1))
|| ... this repeats for each slot

STARTD_ATTRS = $(STARTD_ATTRS), GPU_DETECTED, HAS_GPU

STARTD_CRON_JOBLIST = UPDATEGPUINFO
STARTD_CRON_UPDATEGPUINFO_EXECUTABLE = /path/to/program/gpudetect
STARTD_CRON_UPDATEGPUINFO_PERIOD = 1d
STARTD_CRON_UPDATEGPUINFO_MODE = Periodic
STARTD_CRON_UPDATEGPUINFO_KILL = True

And then I use these in my submission script

+REQUIRES_GPU = True
requirements = HAS_GPU

I can't say whether this is the best way to do all this, but it does
seem to work for me so far, but i'm still testing.


On Mon, Nov 1, 2010 at 2:54 PM, Burnett, Ben <ben.burnett@xxxxxxxx> wrote:
> Mind sharing what you came up with?  I'd be interested in seeing the details.
>
> -B
>
> On 2010-11-01, at 10:25 AM, Michael Di Domenico wrote:
>
>> Thanks, I managed (with help) to get the system up to the point where
>> each slot advertises all the same GPU information (derived from a
>> script), but uses a SLOT_ classad and requirements expression to
>> determine whether a job should run or not.
>>
>> On Fri, Oct 29, 2010 at 8:14 PM, Burnett, Ben <ben.burnett@xxxxxxxx> wrote:
>>> If the single slot pattern I mentioned before does not suit your needs, then you could do something like this:
>>>
>>> 1) create one "GPU" slot per GPU device;
>>> 2) continue to populate all the slot ads with the GPU information;
>>> 3) modify your application to take a GPU device number as a parameter, but pass it the slot number;
>>> 4) use the cudaSetDevice() in your application to tell CUDA to only use that GPU.
>>>
>>> Just a thought.
>>>
>>> -B
>>>
>>> On 2010-10-29, at 11:31 AM, Michael Di Domenico wrote:
>>>
>>>> I'm trying to update the classads from cron, but i only want to add
>>>> classads from the cron to a specific slot.  Is there a mechanism or
>>>> classad notation that I'm missing that would allow me to do this?
>>>>
>>>> Currently when my cron job runs, it outputs the classads, but then
>>>> those classads are sent for all four slots in my server.
>>>>
>>>> I'm trying to register cuda capable devices into condor's startd
>>>> attrs, perhaps theres a better way?
>>>> _______________________________________________
>>>> Condor-users mailing list
>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>> subject: Unsubscribe
>>>> You can also unsubscribe by visiting
>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>
>>>> The archives can be found at:
>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>
>>> _______________________________________________
>>> Condor-users mailing list
>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>