[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] cron and specific slots



Nice. Thanks :)

Are your GPU jobs CPU/IO/etc. intensive?  I ask because It's not entirely clear to me on our grid if they are, so I'm unsure if having an existing slot tied to a GPU is a waste of a slot (i.e. that it could otherwise be doing other work), or if I should continue with the idea of having GPU slots that are somehow independent of the machine slots. I suppose it depends on the nature of the job, and how much of the work is offloaded to the GPU. 

I would be interesting to see CUDA's job to GPU device matching functionality added to the Condor matchmaker (assuming it somehow does a better job of matching than Condor would if GPUs were native resources).

-B

On 2010-11-02, at 11:26 AM, Michael Di Domenico wrote:

> Sure, I can't easily post direct chunks of code, but here's the jist
> (we also use Nvidia only, so this will be biased).  I also can't take
> credit, i there's a sourceforge site (forget url) which setup the leg
> work and then i worked with cyclecomputing to hammer out some details.
> 
> I wrote a cuda program that cycles through the CPU index and attempts
> to open an Nvidia device with the same index number (starting with
> Zero)
> 
> The result of that program output's these classads
> 
> GPU_DETECTED = TRUE
> SLOT1_HAS_GPU = TRUE
> SLOT1_GPU_NAME = "QUADRO FX 580"
> SLOT1_GPU_CUDACAPABLE = TRUE
> SLOT1_GPU_MEM = 536150016
> SLOT1_GPU_PROCS = 4
> SLOT1_GPU_CORES = 32
> SLOT1_GPU_CLOCKRATE = 1.12
> SLOT2_HASH_GPU = FALSE
> SLOT3_HASH_GPU = FALSE
> SLOT4_HASH_GPU = FALSE
> 
> The Slot2..4 are because I ran the program on a four CPU core machine,
> but this one only had one GPU.  We have machines that have Tesla
> S1070's, so there would be  one GPU assigned to each slot
> 
> I then add the below classads to the configuration of the machine with the GPU
> 
> GPU_DETECTED = TRUE
> 
> HAS_GPU = GPU_DETECTED && (((SLOT1_HAS_GPU == TRUE) && (SlotID == 1))
> || ... this repeats for each slot
> 
> STARTD_ATTRS = $(STARTD_ATTRS), GPU_DETECTED, HAS_GPU
> 
> STARTD_CRON_JOBLIST = UPDATEGPUINFO
> STARTD_CRON_UPDATEGPUINFO_EXECUTABLE = /path/to/program/gpudetect
> STARTD_CRON_UPDATEGPUINFO_PERIOD = 1d
> STARTD_CRON_UPDATEGPUINFO_MODE = Periodic
> STARTD_CRON_UPDATEGPUINFO_KILL = True
> 
> And then I use these in my submission script
> 
> +REQUIRES_GPU = True
> requirements = HAS_GPU
> 
> I can't say whether this is the best way to do all this, but it does
> seem to work for me so far, but i'm still testing.
> 
> 
> On Mon, Nov 1, 2010 at 2:54 PM, Burnett, Ben <ben.burnett@xxxxxxxx> wrote:
>> Mind sharing what you came up with?  I'd be interested in seeing the details.
>> 
>> -B
>> 
>> On 2010-11-01, at 10:25 AM, Michael Di Domenico wrote:
>> 
>>> Thanks, I managed (with help) to get the system up to the point where
>>> each slot advertises all the same GPU information (derived from a
>>> script), but uses a SLOT_ classad and requirements expression to
>>> determine whether a job should run or not.
>>> 
>>> On Fri, Oct 29, 2010 at 8:14 PM, Burnett, Ben <ben.burnett@xxxxxxxx> wrote:
>>>> If the single slot pattern I mentioned before does not suit your needs, then you could do something like this:
>>>> 
>>>> 1) create one "GPU" slot per GPU device;
>>>> 2) continue to populate all the slot ads with the GPU information;
>>>> 3) modify your application to take a GPU device number as a parameter, but pass it the slot number;
>>>> 4) use the cudaSetDevice() in your application to tell CUDA to only use that GPU.
>>>> 
>>>> Just a thought.
>>>> 
>>>> -B
>>>> 
>>>> On 2010-10-29, at 11:31 AM, Michael Di Domenico wrote:
>>>> 
>>>>> I'm trying to update the classads from cron, but i only want to add
>>>>> classads from the cron to a specific slot.  Is there a mechanism or
>>>>> classad notation that I'm missing that would allow me to do this?
>>>>> 
>>>>> Currently when my cron job runs, it outputs the classads, but then
>>>>> those classads are sent for all four slots in my server.
>>>>> 
>>>>> I'm trying to register cuda capable devices into condor's startd
>>>>> attrs, perhaps theres a better way?
>>>>> _______________________________________________
>>>>> Condor-users mailing list
>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>>> subject: Unsubscribe
>>>>> You can also unsubscribe by visiting
>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>> 
>>>>> The archives can be found at:
>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>> 
>>>> _______________________________________________
>>>> Condor-users mailing list
>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>> subject: Unsubscribe
>>>> You can also unsubscribe by visiting
>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>> 
>>>> The archives can be found at:
>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>> 
>>> _______________________________________________
>>> Condor-users mailing list
>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>> 
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/condor-users/
>> 
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/