[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] New to Condor - Difficult (I think) problem...



Hi All,

I'm new to condor and distributed computing, so the problem I'm trying to solve may be trivial, difficult or impossible; briefly, here is what I need to do.

We have a pool of multi-CPU (actually dual-CPU) windows machines that we would like to maximize the use of CPU time on. We have three types of jobs to be run with the following requirements for each job type:

1. Single-CPU (about 80% of jobs). These jobs require only one CPU and thus can run concurrently on the same multi-CPU machine up to the number of CPUs on the machine. This seems easy enough and should work "straight out of the box".

2. Multi-CPU (about 15% of jobs). These jobs require all the CPUs on the machine and no other job running on the machine. The application will take care of starting it's own processes/threads to make full use of all CPUs.

3. Multi-CPU, Multi-Machine (about 5% of jobs). These jobs require multiple multi-CPU machines, one master and one or more "slaves". Each machine will be dedicated to this job (i.e. no other jobs on these machines). The application, running on the "master" machine will take care of starting it's own processes/threads (local and remote) to fully utilize the machines assigned to the job. In addition, the "master" machine needs to get a list of all the "slave" machines. (It may be sufficient to limit this to one slave.)

Once started, each job must complete before another is started. If it helps, we may be able to identify two machines to handle the "Multi-CPU, Multi-Machine" case, as long as they can also run type 1 and 2 jobs when type 3 jobs are not in the queue. Writing scripts around the application to gather information to pass to the application is also a possible solution (we have MKS and perl available on all machines).

If this is fairly straight-forward, please say so, but also point in the direction of some documentation and preferably examples.

Any pointers and/or advise will be greatly appreciated.

Thanks,
Bob Mortensen