Less Code Does Not Imply Less Complexity
|
|
|
Steve Yegge has an interesting blog entry entitled "Code's Worst Enemy" that appears to be making the rounds in the blogosphere. The crux of his argument is that the rigor of static typing does enforce a strictness that is useful for keeping large code bases manageable and as a consequence the creation of an unnecessary large codebase. He therefore concludes that a dynamic language would require less code and therefore less complexity.
Unfortunately, his reasoning is just too linear. That is "Less code therefore less complexity". Less code doesn't imply less complexity. The whole idea of refactoring is to continuously reduce entropy. As a result of refactoring, one can lead to more code, however the additional structure leads to easier maintenance.
Laws of Software Complexity
- Zeroth Law. Change Equilibrium - Software cannot be forever insulated from change due to the environment. Either the software evolves, or it becomes irrelevant.
- First Law. Complexity will be Conserved - Incrementally code changes will not reduce intrinsic complexity. Software complexity cannot be destroyed, it can only be moved from one place to another.
- Second Law. Software Complexity tends to Maximum Entropy - External energy (i.e. aggressive refactoring) is required to slow down the trend.
Easier maintenance in that developers have left signposts everywhere in their code to allow other developers to easily navigate and as a consequence quickly understand the lay of the land. Just try walking around in London, where even the natives don't know which way to walk. Static code does have an intrinsic built in documentation that surpasses what can be found in dynamic languages. Environments like Eclipse can do what it does (i.e. find all references, refactor, autocomplete etc.) because this information is explicit in the code. Chinese for example which are less verbose than most other language tends to be ambiguous and less precise than say a more verbose language like say German. Have you ever wondered why text originally written in German and translated in English tend to me more precise and therefore understandable?
The problem with dynamic languages is that there is too much implicitness. Certainly it would work with a small group of developers developing in a small room. However, try it with several teams of developers, geographically dispersed and working in different times. Key implicit facts need to be documented and they are best documented in executable code.
Imagine if Eclipse were to be refactored to a single monolithic code base. I believe it would have less code. Unfortunately, it's extensibility would be compromised and therefore would not achieve it's goals of supporting a community of developers.
The failure of a much hyped Python project like Chandler may be precisely because the proponents believe in too much magic. A Python based organization may consist of personnel with expectations that their language can do more than what's possible. After all, Python has never been mainstream and possibly it's ardent supporters have developed unreasonable expectations as to what it's truly capable of.
One should take a critical eye as to what less mainstream languages claim. Just take a look at Erlang. Their proponents have been preaching that it's the next best thing since slice bread. However, when you really take a look a it, you realize how antiquated it truly is (see: "Erlang the Verdict). Certainly, Functional Programming has its merits, however, do I really have to throw out the baby with the bath water to use it?
This is my exact problem with all the hoopla about the new languages (i.e. Ruby, Groovy, Scala etc.) . Certainly, a lot of language constructs makes the individual developer more productive however can it substitute for the comprehensive component models that were developed to address large scale development (i.e. Eclipse Plugins, OSGI)?
The primary innovation of Java versus C++ was more than a built in garbage collector (one could use the Boehm collector for C++). Garbage collection had it's social merits by ensure that memory side effects were localized. It was the dynamic class loading and reflection capabilities that allowed one to build component systems. C++ developers had to build COM and XPCOM to compensate for this deficiency. Aspects and Annotations are two other language constructs that are there to aid in developing large code bases.
The question one should ask before introducing any new construct is not whether it reduces the number of keystrokes. Reducing keystrokes is a design problem for IDE's and not language designers. When I auto-complete, do I not reduce writing a method name to two keys strokes instead of having to write the whole method name? Who gives a rat's ass if I have to write a semicolon or not, the damn IDE fills it in for me! I certainly am not advocating that developers not learn other languages. Learning different languages should be a pre-requisite in any programmer's education. One will always find, in other languages, idioms that are useful in one's own working language.
The right question to ask is whether a new language construct aids in large scale development. Does the expressiveness of a new language outweigh the social benefits of an older language? To expand that further, the right question is whether a new language feature helps build a community. It'll be interesting to see how the recent interest in social networking could lead to introducing new programming language features.
Be More Specific?
Assumptions - error in laws.
Example: Memory management. At the first level, you have implementations of direct data structure manipulation of the free memory pool every place you need to acquire or release memory. At the second level, you have subroutines "AllocateMem(size)" and "FreeMem(ptr, size)" that you call at each place you want to allocate/free memory. At the third leel, you have malloc(size) and free(ptr) that you call when you want to allocate/free memory (notice how bookkeeping complexity went away). At the fourth level, you add reference counting to this, so you don't have to track the frees outside just knowing when you gain and release a reference. This allows quite a bit simpler implementation of many data structures. More complexity went away. At the fifth level, you add a garbage collector (simple mark and sweep will often do.) You stop having to deal with the reference counting, spread all through your program - instead, you just have a fairly small program piece that scan all your memory and see if you have pointers to memory. Complexity disappears.
Example: Web framework parameter validation. At the first level, you have custom validation in every method that accepts web data. At the second level, you add standard validation methods you use to build that custom validation. At the third level, you wrap most of that in a "validate" call, one that check against your validator objects tied into the model and do standardized error reporting. At the fourth level, you make those validator objects form themselves from the database definitions, with only overrides necessary to specify.
I can repeat examples of this at whatever length is necessary. It is possible to write code filling the same specification in many different ways, with different amounts of complexity. Incremental improvement of abstractions can remove complexity. Yes, often they only move it - but far from always.
Eivind Eklund


