[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor with VLSI Tools



Hi Group,

My company has reach a point where we needed a queue management system like Condor, flow tracer, LSF.
I have convinced most of the developers in the VLSI group that condor is the right application for them.

After implementing Condor in the Research Department cluster, my users were truly satisfied with the results.
However, trying to implement this in the VLSI cluster didn't go so well.

I would like to share with you some of my problems, so maybe someone have an idea how to achieve what I need:

1. The cluster include 20 machines with 24 core each. so total of 480 cores.
2. Each machine has 24GB of RAM.
3. All machines are connected to a NetApp File Server over NFS.
4. All machines are running RHES 6.0 and belong to the same UID domain.

Now,
My users would like to have the cluster managing there jobs as followed:

They would like to have two kind of jobs:
1. Jobs that run right away when submitted
2. Jobs that run in certion scenarios (more below)

However all jobs is depend on a FlexLM license (Matlab, Synopsys VCS etc....)

So say I have 100 licenses of Matlab, and I want to share the licenses in a specific way based on the type of jobs so I would have the following:


1. When users submit a job (that can divided into 400 jobs) I would like him to limit the number of parallel jobs ( so he will not get all the licenses and will leave some for other users) 
Setting concurrency limits is not a good option here, since it is a global definition and not per user. It is true that this can limit the number of parallel jobs (if the limit is reach) but it cannot prevent from user to get all the license available.
I can divided the concurrency limits to be  concurrency limits_A  concurrency limits_B  concurrency limits_C etc.. and split the licenses, but this will prevent from the system to use all the  available licenses. So it can be that concurrency limits_A has reach is limit but concurrency limits_B is free.

2. I want that some jobs will run only if the FlexLM license has minimum 10 free licenses not in use. this will insure that real time jobs will start once they are submitted cause they have a free lic. I don't know how to achieve this. 

3. I have tried to used quota group see (https://www-auth.cs.wisc.edu/lists/condor-users/2011-July/msg00060.shtml) & https://lists.cs.wisc.edu/archive/condor-users/2011-November/msg00117.shtml & https://lists.cs.wisc.edu/archive/condor-users/2011-November/msg00184.shtml without any luck.


Maybe someone can help here...

Thanks
Sassy