[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] pseudo-dedicated machine



Steven Timm wrote:
> On Wed, 7 Jun 2006, Michael Thomas wrote:
> 
> 
>>I have a cluster of 50 nodes, 4 vms per node.  On all but one node I
>>have a certain directory mounted via read-only nfs.  On the remaining
>>node the directory is mounted read-write.
>>
>>Every user coming into the system only needs read-only access to the
>>certain directory.  But one special user always needs read-write access.
>>
>>How can I guarantee that this special user always gets sent to the one
>>node that has read-write access to this directory?  Note that I don't
>>mind if other users also get sent to this read-write node.
> 
> 
> First, define the one node to have an extra attribute in its machine
> classad
> 
> [root@fnpcsrv1 root]# grep IO /opt/condor/local/condor_config.local
> MachineClass = "IO"
> Class = "IO"
> START = JobClass =!= UNDEFINED && JobClass == "IO"
> 
> On a non-grid job, then the user should just add
> +JobClass = "IO"
> requirements = (MachineClass =!= UNDEFINED && MachineClass == "IO")
> 
> to his condor submit file.
> 
> You can force a inbound grid job for that user to do that
> by hacking condor.pm to add these extra two lines to the
> submit script file it writes.
> 
> Steve

Thanks for the tip, Steve.

It almost works...  I hacked condor.pm to add the +JobClass and
requirements.  The job submit script on the CE shows that they get added:

...
Executable =
/home/uscms01/.globus/.gass_cache/local/md5/4a/67571a70a8ae3d2291019518204cc1/md5/81/2e7051cca30e7ea792099078f56ae3/data
+JobClass = "IO"
Requirements = OpSys == "LINUX"  && Arch == "INTEL"  && (MachineClass
=!= UNDEFINED && MachineClass == "IO")
X509UserProxy =
/home/uscms01/.globus/job/citgrid3.cacr.caltech.edu/29347.1150304652/x509_up
...

condor_config.local on the compute node also has the machineclass and
class configuration:

MachineClass = "IO"
Class = "IO"
START = JobClass =!= UNDEFINED && JobClass == "IO"

But it seems that the job's requirements prevent it from running
anywhere.  When I submit the job and run condor_q -better-analyze[1], it
shows that the machineclass requirement is causing it to fail.

How can I query the remote machine to verify that it's loading the
condor_config.local settings as expected?

--Mike


[1]
59349.000:  Run analysis summary.  Of 8 machines,
      8 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      0 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job
        No successful match recorded.
        Last failed match: Wed Jun 14 10:17:29 2006
        Reason for last match failure: no match found

WARNING:  Be advised:
   No resources matched request's constraints

The Requirements expression for your job is:

( target.OpSys == "LINUX" && target.Arch == "INTEL" &&
( target.MachineClass isnt undefined && target.MachineClass == "IO" ) ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( target.HasFileTransfer )

    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( target.MachineClass isnt undefined && target.MachineClass == "IO" )
                                      0                   REMOVE
2   target.OpSys == "LINUX"           8
3   target.Arch == "INTEL"            8
4   ( target.Disk >= 76 )             8
5   ( ( 1024 * target.Memory ) >= 1 ) 8
6   ( target.HasFileTransfer )        8

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature