[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Core grouping



Thanks for the quick Answer Hermann,

I entered the following into the condor_config.local file at specific nodes of the cluster

Cluster_Group = True
STARTD_ATTRS = $(STARTD_ATTRS), Cluster_Group

then set the requirements to the below (basically just added Cluster_Group =?=True to what I was requiring before)

Requirements = (Arch == "X86_64" && OpSys =="WINDOWS" && Cluster_Group =?=True) ||(Arch == "INTEL" && OpSys == "WINDOWS" && Cluster_Group =?=True)

but the jobs sent don't run in any of the available slots! anyone any ideas

N.B: I haven't restarted nor rebuilt the node yet (as I can't do it!!)


On Thu, Dec 20, 2012 at 12:39 PM, Hermann Fuchs <hermann.fuchs@xxxxxxxxxxxxxxxx> wrote:
Hi

The easiest way to make sure a certain software is installed on a
machine is using ClassAds.

On each machine, where you have installed the required software(for
example gnuplot), you define a ClassAd. In your submit file require that
this Class Ad is present.

Example:

On the nodes where gnuplot is installed:
HAS_GNUPLOT = True
STARTD_EXPRS = $(STARTD_EXPRS), HAS_GNUPLOT

In the submit file:
Requirements   = Memory >= 512 && HAS_GNUPLOT =?= True

Then the job will only run on machines which do have the necessary
software.

Best regards,
Hermann

On Thu, 2012-12-20 at 12:22 +0000, Mostafa.B wrote:
> Hi All,
> In our research group, we have a small cluster of 30 cores dedicated
> to the cluster and a few number of temporary cores available in the
> research group network which are added to the cluster whenever they
> are idle.
> Access to the mentioned 30 cores is easy and required third-party
> programs can be installed on their respective PCs without
> difficulties, however the temporary cores are often very hard to
> access.
> Issues arise when a job is sent to one of the cores that doesn't have
> the necessary software to perform the required task. If this was only
> one job, then there would be no major issue. However, when it comes to
> numerous jobs queued, the situation is much more severe.
> In general what I do for running a job in the cluster is, defining the
> executable file as a .bat application which itself runs a few other
> programs that are already installed on the core's respective PC. So
> this clarifies where exactly the problem is happening.
> Has anybody experienced this problem?
> Does anyone know how to tackle this problem without having to remove
> the temporary cores? Probably some way that can specify which group of
> cores may be used for running a specific job.
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

--
-------------
DI Hermann Fuchs
Christian Doppler Laboratory for Medical Radiation Research for Radiation Oncology
Department of Radiation Oncology
Medical University Vienna
Währinger Gürtel 18-20
A-1090 Wien

Tel.  + 43 / 1 / 40 400 7271
Mail. hermann.fuchs@xxxxxxxxxxxxxxxx

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/