In our research group, we have a small cluster of 30 cores dedicated to the cluster and a few number of temporary cores available in the research group network which are added to the cluster whenever they are idle.
Access to the mentioned 30 cores is easy and required third-party programs can be installed on their respective PCs without difficulties, however the temporary cores are often very hard to access.
Issues arise when a job is sent to one of the cores that doesn't have the necessary software to perform the required task. If this was only one job, then there would be no major issue. However, when it comes to numerous jobs queued, the situation is much more severe.
In general what I do for running a job in the cluster is, defining the executable file as a .bat application which itself runs a few other programs that are already installed on the core's respective PC. So this clarifies where exactly the problem is happening.
Has anybody experienced this problem?
Does anyone know how to tackle this problem without having to remove the temporary cores? Probably some way that can specify which group of cores may be used for running a specific job.