The Keystroke Fetish

The Keystroke Fetish. There is a fundamental assumption that pervades the software industry today: Productivity is directly improved by reducing developer keystrokes. Clearly the paradigm shift from Assembly language to 3GLs in the early ‘60s reduced keystrokes by orders of magnitude and also resulted in a major improvement in productivity. In addition, the notion that needing to write less code should make a developer more productive in a given time unit seems intuitive. However, life is rarely that simple.

Productivity is also dependent on other things like languages, tools, subject matter complexity, skill levels, the size of the project, and a host of other factors. The problem is that the software industry, especially IT, has focused on keystrokes when coding at the 3GL level. Consequently, the industry’s solution to enabling higher productivity is to provide tools that reduce keystrokes. Those tools are primarily infrastructures that automate mundane programming tasks. Automation is a very good thing because it generally does, indeed, improve productivity. However, some judgment is required in deciding what and how to automate. My concern is that the industry is pursuing reduced keystrokes at the expense of good judgment.

Two potential problems with automation are size and performance. Automation via infrastructures requires somehow dealing with all the related special cases. That tends to cause infrastructures to bloat with code that deals with situations that are very rarely encountered. One example of this are modern spreadsheets that now have so many features that no single user knows how to use most of them and even the developers are often not aware of all of them. This results in huge applications with tens of MLOC that suck up Gbs of disk space and hundreds of Mbs of memory. Many productivity infrastructures do basically the same thing, but that bloat is largely hidden from the end user and even the application developer.

The processing in a spreadsheet tends to be very linear. That allows a single, commonly used feature to be optimized without a lot of difficulty. In contrast, the processing for infrastructures is usually not linear because they hide massive functionality from the end user or application developer. The infrastructures also tend to be complex because they routinely do things like concurrent processing. So for infrastructures, size is a problem but it is often secondary to performance. I will talk about some of the performance issues in later blog posts, but for now let me just say that performance is often a huge hit. IME, most IT applications that make heavy use of “productivity” infrastructures typically execute at least an order of magnitude slower than they would without the infrastructures.

One can argue that is an acceptable tradeoff, just as 3GL applications run 30-100% slower than hand-crafted Assembly applications. IOW, the productivity benefits in time to market and reliability far outweigh the performance costs. However, that analogy just underscores a much larger problem in the myopia of the industry on particular classes of tools. The industry myopia is on automation tools that reduce 3GL keystrokes. In fact, there are other ways to boost productivity that will yield vastly greater gains in productivity – just as switching languages did in the ‘60s.

Generally one can often enhance productivity through design far more than automating specialized 3GL keystrokes. RAD development itself is a classic example of this. By raising the level of abstraction of coding from 3GLs to a Table/Form paradigm, the RAD tools like Access greatly enhanced productivity. (They have serious limits in problem size and complexity, but in the right niche they are excellent.) That was simply a design insight into the fundamental nature of data processing based on an RDB. One can argue that the RAD tools are automation, but that automation is enabled by design insights into the problem space.

In effect, the RAD tools captured invariants of the RDB problem space and encoded them in tools. One can do exactly the same thing in any large application design. The basic idea is to encode problem space invariants while leaving the details to external configuration data. Doing so at the design level can reduce overall code size by an order of magnitude or more. The problem is that no one is talking about this in the industry. Rather than investing in slow, bloated infrastructures, the industry should be training developers to design properly. (My book has an entire chapter devoted to the use of invariants with examples of order-of-magnitude reductions in code size – just by thinking about the problem rather that leaping to the keyboard.)

The best example that I can think of for how the focus on 3GL keystroke tools is misplaced is translation. Translation technology has been around since the early ‘80s, but it was not until the late ‘90s that optimization techniques matured. Essentially translation is about programming in a 4GL rather than a 3GL. A popular, general purpose 4GL is UML combined with an abstract action language (AAL). A translation engine (aka compiler) then does direct, 100% code generation from the 4GL to a 3GL or Assembly program, including full optimization. The 4GL notation is several orders of magnitude more compact than a 3GL program because it is primarily graphical in nature, it only deals with functional requirements (the transformation engine deals with nonfunctional requirements), and it is at a higher level of abstraction. That compactness represents a huge advantage in source size compared with 3GL programs. (It also results in reliability improvements that are integer factors because the opportunities for the developer to screw up are greatly reduced.) To me, it makes no sense that the industry is largely ignoring converting to 4GLs coding to focus on Band-Aid 3GL infrastructures.

A more subtle problem with productivity infrastructures is maintainability. Such infrastructures are typically designed by vendors with particular development practices in mind. Worse, they are often designed for the convenience of the vendor when developing the infrastructure. This forces the application developer using those infrastructures to tailor the application around them. The most obvious — and, sadly, prevalent — example of this is the plethora of “object-oriented” infrastructures. In fact, many of them are not object-oriented at all and most barely qualify as object-based. Quite often application developers are forced to cut methodological corners just to be able to use them properly. In doing so, they negate the maintainability advantages of OO development. (I’ll have more to say about this in another post.)

Main Topics