[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] classad and slots



On 01/05/2011 09:41 PM, Michael Di Domenico wrote:
On Wed, Jan 5, 2011 at 11:39 AM, Matthew Farrellee<matt@xxxxxxxxxx>  wrote:
On 12/30/2010 02:56 PM, Michael Di Domenico wrote:

I'm not sure i understand how a classad mechanism works, hopefully
someone can straighten it out for me

Yes that's true it is simpler, but doesn't actually fulfill my entire
need.  We have many machines with Nvidia cards all of different
models.  The point of the Cron script is to allow condor to have a
generic configuration irrespective of which particular card is in the
machine (think GPU_NAME == 'Tesla S2050')

From the testing i've done it appears that if i define the SLOT level
classads in the config file and start condor it does indeed overload
the base classad with the slot level classad,

however, if i inject the slot level classads from cron into the
runtime configuration, it does not overload the base classad.

i'm not sure if this is a bug, a design, or a misconfiguration of
condor, but it doesn't seem right to me.  if i can inject classads
into the condor configuration from cron (which works because i inject
HASGPU==True/False from Condor Cron) , why can i not inject slot level
classads in the same way.

The cron is not injecting configuration, it is injecting ClassAd attributes into the ads representing the machine.


It is common to have machine specific configuration. Condor has long allowed a LOCAL_CONFIG_FILE for this purpose. LOCAL_CONFIG_DIR is the direction the MRG package has gone, and I believe VDTs packages as well (though they may want to to confirm that). There is also a LOCAL_CONFIG_FILE = ...| syntax that lets you run a program whose stdout becomes configuration.

The hardware in your machines may be heterogeneous, but I'm going to assume it does not change very often. As part of your installation, you could run your script and generate a local configuration file with the proper GPU_NAME etc. Your hardware maintenance workflow will require knowledge of this step, for when GPUs are changed or new nodes are installed.

If that isn't an option, I'd suggest the cmd| syntax, though it will be recomputing a fixed value every time a condor process (daemon, starter, or cli tool on the node) starts.

http://www.cs.wisc.edu/condor/manual/v7.5/3_3Configuration.html#SECTION00431400000000000000

Best,


matt

I've defined a classad in the config file

GPU_DEVICE=999

I then have a condor cron script that adds in

SLOT1_GPU_DEVICE=0
SLOT2_GPU_DEVICE=1
SLOT3_GPU_DEVICE=2

What i'd like to do is reference the slot level config variable from
my submit script

Arguments = $([Target.GPU_DEVICE])

When i submit a test program, i seem to be getting the 999 value
instead of the Slot level value

Are Slot level classads referential as non-slot level (ie GPU_DEVICE
instead of SLOT1_GPU_DEVICE) from a submit file?

How can I check these using condor_status?  I tried the constraint
option, but i'm not sure i have the syntax right

Simpler than using a cron script, you can just use STARTD_ATTRS. Have a look
in your condor_config file and search for STARTD_EXPRS.

To see the attributes, you can use condor_status -long and grep, or
something with condor_status -format.

Also, you probably want $$([target.GPU_DEVICE])), with two $s.

Here's a little example...

$ condor_config_val -dump | grep GPU
SLOT1_GPU_DEVICE = 0
SLOT2_GPU_DEVICE = 1
SLOT3_GPU_DEVICE = 2
STARTD_ATTRS = GPU_DEVICE

$ condor_status -format "%s\t" Name -format "%d\n" GPU_DEVICE
slot1@xxxxxxxxxxxx      0

$ echo
"cmd=/bin/echo\narguments=\$\$([target.GPU_DEVICE])\n+GPU_DEVICE=999\noutput=echo.out\nqueue"
| condor_submit

$ cat echo.out
0

Best,


matt

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/