An Occam's Razor for System Design
Occam's Razor states: "All other things being equal, the simplest solution is the best." Stated differently, it asks us to find the simplest theory that fits the facts. Of course, the metric for simplicity is unstated, but presumably it is apparent to the perceptive mind.
I have been thinking that a variant of Occam's razor ought to be applied to systems design. Any design tries to meet some goals given some assumptions. Setting aside poor designs, which do not bother to articulate either the goals or the assumptions, lets consider those designs which try to achieve the same goals. My reading of Occam's razor then advocates the design that makes the least assumptions. This is related to the network tenet "Be tolerant in what you receive and conservative in what you send." By being tolerant of inputs, the system is less fragile, and therefore less dependent on assumptions about the input.
When comparing two solutions that both achieve the same goals, and make the same assumptions, we should choose the simpler solution. But what does simplicity mean? Is there a metric of complexity?
Now, in the area of algorithm design, the notion of time (or space) complexity is well understood. We consider the asymptotic limit on the execution time as the size of the input grows. This assumes, of course, that all problems can be made as large as necessary.
This notion of asymptotic complexity is not relevant for many system designs. Real systems are bounded. You do not need to show humans videos faster than 30 frames a second, for instance, and you do not need to build a system that deals with more than, say, 12 billion cell phones. So, in this sense, asymptotic limits are meaningless. But then, what is the replacement for time complexity?
I can offer two alternatives.
The first alternative is the implementation and testing time. Given two systems, you should prefer one that can be built faster. If you have a set of building blocks that permit assembly of complex structures rapidly, so much the better. This is the reason why Unix succeeded. Cheap processes, pipes, and scripting allow rapid construction of complex applications. From that perspective, an application constructed from Unix pipes (what you would call a 'mashup' today), is the 'simplest' even if the underlying primitives themselves are complex. In the same way, given the ease of database integration these days, even the use of a complex database like postgres can be thought to be 'simple'.
The second alternative is the amount of shared state. Shared state necessiates communication for maintaining state consistency. Moreover, discrepancies in shared state are at the root of almost all bugs in distributed systems. Therefore, the complexity of a distributed system, which is to say, nearly all systems today, can be measured in terms of the amount of shared state. More precisely, we can multiply the bytes of shared state with the number of nodes that need this state (or perhaps, given n nodes n^2 or n log(n)) and call this the complexity of the system.
Both measures are approximate, and not necessarily always applicable, but represent at least a first step in measuring the complexity of today's distributed systems. Thus, they would allow us to practically implement Occam's razor.
As another first step, it may be a good idea for system designers to justify their work using Occam's razor. They would need to show that their system is the most parsimonious one, using one the two metrics above, or something equivalent. Hopefully, this will automatically rule out overly baroque designs.
I have been thinking that a variant of Occam's razor ought to be applied to systems design. Any design tries to meet some goals given some assumptions. Setting aside poor designs, which do not bother to articulate either the goals or the assumptions, lets consider those designs which try to achieve the same goals. My reading of Occam's razor then advocates the design that makes the least assumptions. This is related to the network tenet "Be tolerant in what you receive and conservative in what you send." By being tolerant of inputs, the system is less fragile, and therefore less dependent on assumptions about the input.
When comparing two solutions that both achieve the same goals, and make the same assumptions, we should choose the simpler solution. But what does simplicity mean? Is there a metric of complexity?
Now, in the area of algorithm design, the notion of time (or space) complexity is well understood. We consider the asymptotic limit on the execution time as the size of the input grows. This assumes, of course, that all problems can be made as large as necessary.
This notion of asymptotic complexity is not relevant for many system designs. Real systems are bounded. You do not need to show humans videos faster than 30 frames a second, for instance, and you do not need to build a system that deals with more than, say, 12 billion cell phones. So, in this sense, asymptotic limits are meaningless. But then, what is the replacement for time complexity?
I can offer two alternatives.
The first alternative is the implementation and testing time. Given two systems, you should prefer one that can be built faster. If you have a set of building blocks that permit assembly of complex structures rapidly, so much the better. This is the reason why Unix succeeded. Cheap processes, pipes, and scripting allow rapid construction of complex applications. From that perspective, an application constructed from Unix pipes (what you would call a 'mashup' today), is the 'simplest' even if the underlying primitives themselves are complex. In the same way, given the ease of database integration these days, even the use of a complex database like postgres can be thought to be 'simple'.
The second alternative is the amount of shared state. Shared state necessiates communication for maintaining state consistency. Moreover, discrepancies in shared state are at the root of almost all bugs in distributed systems. Therefore, the complexity of a distributed system, which is to say, nearly all systems today, can be measured in terms of the amount of shared state. More precisely, we can multiply the bytes of shared state with the number of nodes that need this state (or perhaps, given n nodes n^2 or n log(n)) and call this the complexity of the system.
Both measures are approximate, and not necessarily always applicable, but represent at least a first step in measuring the complexity of today's distributed systems. Thus, they would allow us to practically implement Occam's razor.
As another first step, it may be a good idea for system designers to justify their work using Occam's razor. They would need to show that their system is the most parsimonious one, using one the two metrics above, or something equivalent. Hopefully, this will automatically rule out overly baroque designs.

0 Comments:
Post a Comment
Links to this post:
Create a Link
<< Home