[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor Simulation Tool

I think the problem with writing a simulator is you'll be writing Condor. There's not really any good way to simulate a massively parallel system (you could view each slot as a separate "thread") on a not-massively-parallel architecture like a stand-alone machine. The time it'd take to walk through the simulation and evaluate the state at every clock tick would be large and even then you wouldn't be accurate because state changes over network wires can't be 100% synchronized to clock ticks -- there's randomness there (router lag, wire propagation, collisions, etc.) that you just can't model in without a huge amount of effort.

Add to that job loads and job characteristics vary widely so a spec for defining jobs in the simulator would be non-trivial and what you end up writing is…well…Condor. All over again. :)

To explore changes one approach you can take is to build a shadow system on several machines. Where you overload the slots on the machines, say 5:1 or something like that, and then only run jobs that sleep and not do any actual processing. This can help you explore configuration and policy changes more quickly than deploying them into a production system.

- Ian

Ian Chesal

Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools


On Monday, 1 August, 2011 at 5:20 PM, Sassy Natan wrote:

Is there any plan do add a simulation tool for condor rules? Like who get's what? and when? and how much? in different scenarios?

So for ex. If I have X slots, Y Jobs, Z users, Y/Z job per user, A ranks, different preemption rules, different groups etc.. etc.. what will happen if One machine goes down. How it will impact the cluster performance? so maybe if I will just change a simple rule, or parameter, disable or enable some feature, this will cause to improve the system performance so dramatic, so I will truly consider to change the configuration?

This can truly tell what the user will get in a specific scenario, so you don't have to test it and analyse the results later. 

This system is so flexible (and this is what I like about her), but them it makes hard to know what will happen if and if, so only after a lot of tweaking and reverse engineering you get to your wishful result.   


Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: