Dealing with complexity, the finite capacity of human mind, divide and conquer, abstraction.
Dealing with complexity is the essence of software engineering. It is also the most demanding part of it, requiring both discipline and creativity. Why do we need special methodologies to deal with complexity? The answer is in our brains. In our immediate memory we can deal only with a finite and rather small number of objects--whatever type they are, ideas, images, words. The ballpark figure is seven plus/minus two, depending on the complexity of the objects themselves. Apparently in many ancient cultures the number seven was considered synonymous with many. There are many folk stories that start with "Long, long ago behind seven mountains, behind seven forests, behind seven rivers there lived..."
There are essentially two ways in which we human beings can deal with complexity. The divide-and-conquer method, and the abstraction method. The divide-and-conquer methods is based on imposing a tree-like structure on top of a complex problem. The idea is that at every node of the tree we have to deal with only a small number of branches, within the limits of our immediate memory. The traversal of the tree leaf-to-root or root-to-leaf requires only a logarithmic number of steps-- again, presumably within the limits of our immediate memory. For instance, the body of academic knowledge is divided into humanities and sciences (branching factor of 2). Sciences are subdivided into various areas, one of them being Computer Science, and so on.
To understand Kernighan and Ritchie's book on C, the CS student needs only very limited education in humanities. On the other hand, to write a poem one is not required to program in C. The tree-like subdivision of human knowledge not only facilitates in-depth traversal and search, it also enables division of work between various teams. We can think of the whole humanity as one large team taking part in the enormous project of trying to understand the World.
Another very powerful tool developed by all living organisms and perfected by humans is abstraction. The word "abstraction" has the same root as subtraction. Abstracting means subtracting non-essential features. Think of how many features you can safely subtract from the description of your car before it stops being recognizable as a car. Definitely the color of the paint, the license plates, the windshield wipers, the capacity of the trunk, etc. Unconsciously the same process is applied by a bird when it creates its definition of a "predator." Abstraction is not 100% accurate: a crow may get scared by a scarecrow, which somehow falls within its abstract notion of a "predator."
Division and abstraction go hand in hand in, what one can call, divide-and-abstract paradigm. A complex system can be visualized as a very large network of interconnected nodes. We divide this network into a few "objects"--subsets of nodes. A good division has the property that there are as few inter-object connections as possible. To describe the objects resulting from such a division we use abstraction. In particular, we can describe the objects by the way they connect to other objects (the interface). We can simplify their inner structure by subtracting as many inessential features as possible. At every stage of division, it should be possible to understand the whole system in terms of interactions between a few well abstracted objects. If there is no such way, we give up. The real miracle of our World is that large portions of it (maybe even everything) can be approached using this paradigm.
This process is then repeated recursively by dividing objects into sub-objects, and so on. For every object, we first undo the abstraction by adding back all the features we have subtracted, divide it into sub-objects and use new abstractions to define them. An object should become understandable in terms of a few well abstracted sub-objects. In some way this recursive process creates a self-similar, fractal-like structure.
In software engineering we divide a large project into manageable pieces. In order to define, name and describe these pieces we use abstraction. We can talk about symbol tables, parsers, indexes, storage layers, etc. They are all abstractions. And they let us divide a bigger problem into smaller pieces.
Let me illustrate these ideas with the familiar example of the software project that we've been developing in the second part of the book--the calculator. The top level of the project is structured into a set of interrelated objects, Figure.
This system is closed in the sense that one can explain how the program works (what the function main does) using only these objects--their public interfaces and their functionality. It is not necessary to know how these object perform their functions; it is enough to know what they do.
So how does the program work? First, the Calculator is created inside main. The Calculator is Serializable, that means that its state can be saved and restored. Notice that, at this level, we don't need to know anything about the streams--they are black boxes with no visible interface (that's why I didn't include them in this picture).
Once the Calculator is created, we enter the loop in which we get a stream of text from the user and create a Scanner from it. The Scanner can tell us whether the user input is a command or not. If it is a command, we create a CommandParser, otherwise we create a Parser. Either of them requires access to both the Calculator and the Scanner. CommandParser can Execute a command, whereas Parser can Parse the input and Calculate the result. We then display the result and go back to the beginning of the loop. The loop terminates when CommandParser returns status stQuit from the Execute method.
That's it! It could hardly be simpler than that. It's not easy, though, to come up with such a nice set of abstraction on the first try. In fact we didn't! We had to go through a series of rewrites in order to arrive at this simple structure. All the techniques and little rules of thumb described in the second part of the book had this goal in mind.
But let's continue the journey. Let's zoom-in into one of the top level components--the Calculator. Again, it can be described in terms of a set of interrelated objects, Figure.
And again, I could explain the implementation of all Calculator methods using only these objects (and a few from the level above).
Next, I could zoom-in on the Store object and see a very similar picture.
I could go on like this, just like in one of these Mandelbrot set programs, where you can zoom-in on any part of the picture and see something that is different and yet similar. With a mathematical fractal, you can keep zooming-in indefinitely and keep seeing the same infinite level of detail. With a software project, you will eventually get to the level of plain built-in types and commands. (Of course, you may continue zooming-in into assembly language, microcode, gates, transistors, atoms, quarks, superstrings and further, but that's beyond the scope of this book.)
The lifetime of the project, cyclic nature of programming, the phases, open-ended design, the program as a living organism.
Every software project has a beginning. Very few have an end (unless they are cancelled by the management). You should get used to this kind of open-ended development. You will save yourself and your coworkers a lot of grief . Assume from the very beginning that:
Design for version 2, implement for version 1. Some of the functionality expected in v. 2 should be stubbed out in v. 1 using dummy components.
The development of a software project consists of cycles of different magnitude. The longest scale cycle is the major version cycle. Within it we usually have one or more minor version cycles. The creation of a major version goes through the following stages:
Time-wise, these phases are interlaced. Architectural design feeds back into the requirements spec. Some features turn out to be too expensive, the need for others arises during the design.
Implementation feeds back into the design in a major way. Some even suggest that the development should go through the cycles of implementation of throw-away prototypes and phases of re-design. Throwing away a prototype is usually too big a waste of development time. It means that too little time was spent designing and studying the problem, and that the implementation methodology was inadequate.
One is not supposed to use a different methodology when designing and implementing prototypes, scaffolding or stubs--as opposed to designing and implementing the final product. Not following this rule is a sign of hypocrisy. Not only is it demoralizing, but it doesn't save any development time. Quite the opposite! My fellow programmers and I were bitten by bugs or omissions in the scaffolding code so many times, and wasted so much time chasing such bugs, that we have finally learned to write scaffolding the same way we write production code. As a side effect, whenever the scaffolding survives the implementation cycle and gets into the final product (you'd be surprised how often that happens!), it doesn't lead to any major disasters.
Going back to the implementation cycle. Implementing or rewriting any major component has to be preceded by careful and detailed design or re-design. The documentation is usually updated in the process, little essays are added to the architectural spec. In general, the design should be treated as an open-ended process. It is almost always strongly influenced by implementation decisions. This is why it is so important to have the discipline to constantly update the documentation. Documentation that is out of sync with the project is useless (or worse than useless--it creates misinformation).
The implementation proper is also done in little cycles. These are the fundamental edit-compile-run cycles, well known to every programmer. Notice how testing is again interlaced with the development. The run part of the cycle serves as a simple sanity test.
At this level, the work of a programmer resembles that of a physician. The first principle-- never harm the patient--applies very well to programming. It is called "don't break the code." The program should be treated like a living organism. You have to keep it alive at all times. Killing the program and then resuscitating it is not the right approach. So make all changes in little steps that are self-contained and as much testable as possible. Some functionality may be temporarily disabled when doing a big "organ transplant," but in general the program should be functional at all times.
Finally, a word of caution: How not to develop a project (and how it is still done in many places). Don't jump into implementation too quickly. Be patient. Resist the pressure from the managers to have something for a demo as soon as possible. Think before you code. Don't sit in front of the computer with only a vague idea of what you want to do with the hope that you'll figure it out by trial and error. Don't write sloppy code "to be cleaned up later." There is a big difference between stubbing out some functionality and writing sloppy code.
Humility, simplicity, team spirit, dedication.
A programmer is a human being. Failing to recognize it is a source of many misunderstandings. The fact that the programmer interacts a lot with a computer doesn't mean that he or she is any less human. Since it is the computer that is supposed to serve the humans and not the other way around, programming as an activity should be organized around the humans. It sounds like a truism, but you'd be surprised how often this simple rule is violated in real life. Forcing people to program in assembly (or C for that matter) is just one example. Structuring the design around low level data structures like hash tables, linked lists, etc., is another example.
The fact that jobs of programmers haven't been eliminated by computers (quite the opposite!) means that being human has its advantages. The fact that some human jobs have been eliminated by computers means that being computer has its advantages. The fundamental equation of software engineering is thus
Trying to write programs combining human speed and reliability with computer creativity is a big mistake! So let's face it, we humans are slow and unreliable. When a programmer has to wait for the computer to finish compilation, something is wrong. When the programmer is supposed to write error-free code without any help from the compiler, linker or debugger, something is wrong. If the programmer, instead of solving a problem with paper and pencil, tries to find the combination of parameters that doesn't lead to a general protection fault by trial and error, something is badly wrong.
The character traits that make a good programmer are (maybe not so surprisingly) similar to those of a martial art disciple. Humility, patience, simplicity on the one hand; dedication and team spirit on the other hand. And most of all, mistrust towards everybody including oneself.