[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Core grouping



Hi,

use condor_reconfig [1] if you have made changes to either config file that affects a daemon. It's more in line with what you actually want to do, and will have less of a disruptive effect then completely restarting a node.

N.B: I haven't restarted nor rebuilt the node yet (as I can't do it!!)

If you are authorized to change the configuration files but not to restart/reconfigure the service, you should rethink your administration policies. Not sure if I understand you correctly though... If know that your reconfiguration is fine with the administration and the condor services are set to automatically start on boot-up, you can try rebooting the machines condor is running on. This should make Condor load the new configuration files (as the Daemons restart as well) even when you cannot manually tell it to do so.

Cheers,
Max

[1]
http://research.cs.wisc.edu/htcondor/manual/current/condor_reconfig.html

On 12/21/2012 04:19 PM, Hermann Fuchs wrote:
Hi

I guess you need to restart condor on the nodes, in order for the
changes to take effect.

Best regards,
Hermann
On Fri, 2012-12-21 at 12:52 +0000, Mostafa.B wrote:
Thanks for the quick Answer Hermann,


I entered the following into the condor_config.local file at specific
nodes of the cluster


Cluster_Group = True

STARTD_ATTRS = $(STARTD_ATTRS), Cluster_Group


then set the requirements to the below (basically just
added Cluster_Group =?=True to what I was requiring before)


Requirements = (Arch == "X86_64" && OpSys =="WINDOWS" && Cluster_Group
=?=True) ||(Arch == "INTEL" && OpSys == "WINDOWS" && Cluster_Group
=?=True)



but the jobs sent don't run in any of the available slots! anyone any
ideas


N.B: I haven't restarted nor rebuilt the node yet (as I can't do it!!)


On Thu, Dec 20, 2012 at 12:39 PM, Hermann Fuchs
<hermann.fuchs@xxxxxxxxxxxxxxxx> wrote:
         Hi
The easiest way to make sure a certain software is installed
         on a
         machine is using ClassAds.
On each machine, where you have installed the required
         software(for
         example gnuplot), you define a ClassAd. In your submit file
         require that
         this Class Ad is present.
Example: On the nodes where gnuplot is installed:
         HAS_GNUPLOT = True
         STARTD_EXPRS = $(STARTD_EXPRS), HAS_GNUPLOT
In the submit file:
         Requirements   = Memory >= 512 && HAS_GNUPLOT =?= True
Then the job will only run on machines which do have the
         necessary
         software.
Best regards,
         Hermann
On Thu, 2012-12-20 at 12:22 +0000, Mostafa.B wrote:
         > Hi All,
         > In our research group, we have a small cluster of 30 cores
         dedicated
         > to the cluster and a few number of temporary cores available
         in the
         > research group network which are added to the cluster
         whenever they
         > are idle.
         > Access to the mentioned 30 cores is easy and required
         third-party
         > programs can be installed on their respective PCs without
         > difficulties, however the temporary cores are often very
         hard to
         > access.
         > Issues arise when a job is sent to one of the cores that
         doesn't have
         > the necessary software to perform the required task. If this
         was only
         > one job, then there would be no major issue. However, when
         it comes to
         > numerous jobs queued, the situation is much more severe.
         > In general what I do for running a job in the cluster is,
         defining the
         > executable file as a .bat application which itself runs a
         few other
         > programs that are already installed on the core's respective
         PC. So
         > this clarifies where exactly the problem is happening.
         > Has anybody experienced this problem?
         > Does anyone know how to tackle this problem without having
         to remove
         > the temporary cores? Probably some way that can specify
         which group of
         > cores may be used for running a specific job.
         >
> _______________________________________________
         > HTCondor-users mailing list
         > To unsubscribe, send a message to
         htcondor-users-request@xxxxxxxxxxx with a
         > subject: Unsubscribe
         > You can also unsubscribe by visiting
         > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
         >
         > The archives can be found at:
         > https://lists.cs.wisc.edu/archive/htcondor-users/
--
         -------------
         DI Hermann Fuchs
         Christian Doppler Laboratory for Medical Radiation Research
         for Radiation Oncology
         Department of Radiation Oncology
         Medical University Vienna
         Währinger Gürtel 18-20
         A-1090 Wien
Tel. + 43 / 1 / 40 400 7271
         Mail. hermann.fuchs@xxxxxxxxxxxxxxxx
_______________________________________________
         HTCondor-users mailing list
         To unsubscribe, send a message to
         htcondor-users-request@xxxxxxxxxxx with a
         subject: Unsubscribe
         You can also unsubscribe by visiting
         https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
         https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/