[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] pseudo-dedicated machine

On Wed, 14 Jun 2006, Michael Thomas wrote:

Steven Timm wrote:
On Wed, 7 Jun 2006, Michael Thomas wrote:

I have a cluster of 50 nodes, 4 vms per node.  On all but one node I
have a certain directory mounted via read-only nfs.  On the remaining
node the directory is mounted read-write.

Every user coming into the system only needs read-only access to the
certain directory.  But one special user always needs read-write access.

How can I guarantee that this special user always gets sent to the one
node that has read-write access to this directory?  Note that I don't
mind if other users also get sent to this read-write node.

First, define the one node to have an extra attribute in its machine

[root@fnpcsrv1 root]# grep IO /opt/condor/local/condor_config.local
MachineClass = "IO"
Class = "IO"
START = JobClass =!= UNDEFINED && JobClass == "IO"

On a non-grid job, then the user should just add
+JobClass = "IO"
requirements = (MachineClass =!= UNDEFINED && MachineClass == "IO")

to his condor submit file.

You can force a inbound grid job for that user to do that
by hacking condor.pm to add these extra two lines to the
submit script file it writes.


Thanks for the tip, Steve.

It almost works...  I hacked condor.pm to add the +JobClass and
requirements.  The job submit script on the CE shows that they get added:

Executable =
+JobClass = "IO"
Requirements = OpSys == "LINUX"  && Arch == "INTEL"  && (MachineClass
=!= UNDEFINED && MachineClass == "IO")
X509UserProxy =

condor_config.local on the compute node also has the machineclass and
class configuration:

MachineClass = "IO"
Class = "IO"
START = JobClass =!= UNDEFINED && JobClass == "IO"

But it seems that the job's requirements prevent it from running
anywhere.  When I submit the job and run condor_q -better-analyze[1], it
shows that the machineclass requirement is causing it to fail.

How can I query the remote machine to verify that it's loading the
condor_config.local settings as expected?

condor_status -l <hostname>

or condor_config_val -startd MachineClass   on the machine in question.



59349.000:  Run analysis summary.  Of 8 machines,
     8 are rejected by your job's requirements
     0 reject your job because of their own requirements
     0 match but are serving users with a better priority in the pool
     0 match but reject the job for unknown reasons
     0 match but will not currently preempt their existing job
     0 are available to run your job
       No successful match recorded.
       Last failed match: Wed Jun 14 10:17:29 2006
       Reason for last match failure: no match found

WARNING:  Be advised:
  No resources matched request's constraints

The Requirements expression for your job is:

( target.OpSys == "LINUX" && target.Arch == "INTEL" &&
( target.MachineClass isnt undefined && target.MachineClass == "IO" ) ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( target.HasFileTransfer )

   Condition                         Machines Matched    Suggestion
   ---------                         ----------------    ----------
1   ( target.MachineClass isnt undefined && target.MachineClass == "IO" )
                                     0                   REMOVE
2   target.OpSys == "LINUX"           8
3   target.Arch == "INTEL"            8
4   ( target.Disk >= 76 )             8
5   ( ( 1024 * target.Memory ) >= 1 ) 8
6   ( target.HasFileTransfer )        8

Steven C. Timm, Ph.D  (630) 840-8525  timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Div/Core Support Services Dept./Scientific Computing Section
Assistant Group Leader, Farms and Clustered Systems Group
Lead of Computing Farms Team