Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor and cuda vm usage

Date: Fri, 12 Oct 2012 12:13:10 -0400
From: Michael Di Domenico <mdidomenico4@xxxxxxxxx>
Subject: [Condor-users] condor and cuda vm usage

Is there any way to change how condor tracks virtual memory for job?
I ask, because we've encountered this scenario with condor and gpu's

we have slots assigned to gpu's, (nvidia tesla running cuda)

the job starts up, runs fine and then is preempted by another user

when the preemption occurs, condor updates the ImageSize classad to
include then entire "unified memory address space" that cuda maps out.

so this means, that when condor goes to reschedule the job, it
searches for a slot that (in our case) needs 75GB of memory (which we
don't have)

is there anyway to prevent this?  we don't want to turn of the
memory/slot checking off, just want to keep condor from tracking this
cuda uva system

i can restart the jobs by doing a condor_qedit, but i'd prefer not to
have to do that each time this happens (which isn't often, but enough
it's annoying)

Prev by Date: Re: [Condor-users] FW: [Condor] Problem condor_startd died (11)
Next by Date: [Condor-users] dedicated scheduler can also run jobs?
Previous by thread: [Condor-users] SSL user certificate locations
Next by thread: [Condor-users] dedicated scheduler can also run jobs?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] condor and cuda vm usage