The Keystroke Fetish

The Keystroke Fetish. There is a fundamental assumption that pervades the software industry today: Productivity is directly improved by reducing developer keystrokes. Clearly the paradigm shift from Assembly language to 3GLs in the early ‘60s reduced keystrokes by orders of magnitude and also resulted in a major improvement in productivity. In addition, the notion that needing to write less code should make a developer more productive in a given time unit seems intuitive. However, life is rarely that simple.

Productivity is also dependent on other things like languages, tools, subject matter complexity, skill levels, the size of the project, and a host of other factors. The problem is that the software industry, especially IT, has focused on keystrokes when coding at the 3GL level. Consequently, the industry’s solution to enabling higher productivity is to provide tools that reduce keystrokes. Those tools are primarily infrastructures that automate mundane programming tasks. Automation is a very good thing because it generally does, indeed, improve productivity. However, some judgment is required in deciding what and how to automate. My concern is that the industry is pursuing reduced keystrokes at the expense of good judgment.

Two potential problems with automation are size and performance. Automation via infrastructures requires somehow dealing with all the related special cases. That tends to cause infrastructures to bloat with code that deals with situations that are very rarely encountered. One example of this are modern spreadsheets that now have so many features that no single user knows how to use most of them and even the developers are often not aware of all of them. This results in huge applications with tens of MLOC that suck up Gbs of disk space and hundreds of Mbs of memory. Many productivity infrastructures do basically the same thing, but that bloat is largely hidden from the end user and even the application developer.

The processing in a spreadsheet tends to be very linear. That allows a single, commonly used feature to be optimized without a lot of difficulty. In contrast, the processing for infrastructures is usually not linear because they hide massive functionality from the end user or application developer. The infrastructures also tend to be complex because they routinely do things like concurrent processing. So for infrastructures, size is a problem but it is often secondary to performance. I will talk about some of the performance issues in later blog posts, but for now let me just say that performance is often a huge hit. IME, most IT applications that make heavy use of “productivity” infrastructures typically execute at least an order of magnitude slower than they would without the infrastructures.

One can argue that is an acceptable tradeoff, just as 3GL applications run 30-100% slower than hand-crafted Assembly applications. IOW, the productivity benefits in time to market and reliability far outweigh the performance costs. However, that analogy just underscores a much larger problem in the myopia of the industry on particular classes of tools. The industry myopia is on automation tools that reduce 3GL keystrokes. In fact, there are other ways to boost productivity that will yield vastly greater gains in productivity – just as switching languages did in the ‘60s.

Generally one can often enhance productivity through design far more than automating specialized 3GL keystrokes. RAD development itself is a classic example of this. By raising the level of abstraction of coding from 3GLs to a Table/Form paradigm, the RAD tools like Access greatly enhanced productivity. (They have serious limits in problem size and complexity, but in the right niche they are excellent.) That was simply a design insight into the fundamental nature of data processing based on an RDB. One can argue that the RAD tools are automation, but that automation is enabled by design insights into the problem space.

In effect, the RAD tools captured invariants of the RDB problem space and encoded them in tools. One can do exactly the same thing in any large application design. The basic idea is to encode problem space invariants while leaving the details to external configuration data. Doing so at the design level can reduce overall code size by an order of magnitude or more. The problem is that no one is talking about this in the industry. Rather than investing in slow, bloated infrastructures, the industry should be training developers to design properly. (My book has an entire chapter devoted to the use of invariants with examples of order-of-magnitude reductions in code size – just by thinking about the problem rather that leaping to the keyboard.)

The best example that I can think of for how the focus on 3GL keystroke tools is misplaced is translation. Translation technology has been around since the early ‘80s, but it was not until the late ‘90s that optimization techniques matured. Essentially translation is about programming in a 4GL rather than a 3GL. A popular, general purpose 4GL is UML combined with an abstract action language (AAL). A translation engine (aka compiler) then does direct, 100% code generation from the 4GL to a 3GL or Assembly program, including full optimization. The 4GL notation is several orders of magnitude more compact than a 3GL program because it is primarily graphical in nature, it only deals with functional requirements (the transformation engine deals with nonfunctional requirements), and it is at a higher level of abstraction. That compactness represents a huge advantage in source size compared with 3GL programs. (It also results in reliability improvements that are integer factors because the opportunities for the developer to screw up are greatly reduced.) To me, it makes no sense that the industry is largely ignoring converting to 4GLs coding to focus on Band-Aid 3GL infrastructures.

A more subtle problem with productivity infrastructures is maintainability. Such infrastructures are typically designed by vendors with particular development practices in mind. Worse, they are often designed for the convenience of the vendor when developing the infrastructure. This forces the application developer using those infrastructures to tailor the application around them. The most obvious — and, sadly, prevalent — example of this is the plethora of “object-oriented” infrastructures. In fact, many of them are not object-oriented at all and most barely qualify as object-based. Quite often application developers are forced to cut methodological corners just to be able to use them properly. In doing so, they negate the maintainability advantages of OO development. (I’ll have more to say about this in another post.)

The Demon ASCII

With the growth of the Internet that started in the mid-‘80s came an attendant growth in networking. That introduced a rapidly increasing need for applications to share data and interact with one another across platforms. The problem was that the hardware vendors could not standardize on very basic things like the way data is represented in digital bits. Thus binary data written on one platform was unreadable on a different platform.

IMO, this is inexcusable. There are only three characteristics of binary data that differentiate data formats: the number of bits in the smallest aggregate of bits (i.e., the byte size in bits); which end of the aggregate has the least significant bit; and the ordering of aggregates within larger aggregates (i.e., where the least significant byte is in a word). Back in the ‘60s, computer vendors vied with one another to make the most efficient machines and various combinations of these three characteristics evolved. However, today those characteristics are no longer relevant to ALU optimization, so there is no reason not to standardize on a single set of characteristic values. (That’s not quite true for representing spoken languages, but memory is so cheap today that always using 16 bits for a character in a minor concern.) The overall cost in performance for all computing due to this failure to standardize is mind-boggling.

There are three approaches to resolving the inconsistencies in hardware data formats. Given that the hardware vendors refuse to standardize, they could at least provide firmware instructions to allow a foreign format to be converted to their native format on a word-by-word basis. From a performance perspective, this would be far and away the best solution, though it would require several instruction sets to accommodate each of the various combinations.

The second approach would be to use a software proxy to convert data coming from an external source to the format of the receiving platform as it arrives. This was done at a company where I once worked for their internal LANs. It is surprisingly simple to do and there is relatively little overhead, compared to the third alternative. The reason we did that was because our software would have been infeasible due to poor performance if we hadn’t; given the machines we were working with in the early ‘80s, you could have had children in the time it would take to execute using the third alternative.

The third approach is the one used by virtually all infrastructures in IT. The basic idea is to convert all binary numeric data to ASCII and transmit that between platforms because the ASCII format is a standard. One then reconverts it back to binary for computations. This is the worst possible choice for resolving binary data compatibility because the machine instructions to convert back and forth between binary numbers and ASCII numbers are very expensive. (In fact, they are not even individual instructions; on today’s machines, ASCII conversions can only be done with software algorithms that involve executing a large number of instructions, especially for floating point data.)

If the use of ASCII were limited to just pure interoperability issues around converting the hardware format from an external platform, the problem would only warrant a small head shake and a tsk-tsk about such foolishness. However, the ASCII approach has become ubiquitous in IT and it is manifested in things like overuse of markup and scripting languages (e.g., HTML, XML, etc.). So the use of ASCII is not limited to network port proxies; it permeates IT applications because it allows a generic parser to be built for any platform, thus saving the developer keystrokes for processing specific binary data structures in memory.

The result is mind-numbing overhead. In 1984 you could run a spreadsheet in less than a minute on a TRS-80 or Apple I that had 64K of memory, a floppy drive, and a clock rate measured in kHz. Today, you could not even load the same spreadsheet on such a machine and, if you could, it would take hours to execute. Some of that, memory constraints, is due to code and feature bloat, but most of the performance hit is due to ASCII processing in the bowels of the spreadsheet program. Every time I talk to an IT guy about specific performance where lots of keystrokes were saved using massive “helpful” infrastructures, I am astounded that their examples take minutes to execute when equivalent processing in a cycle-counting R-T/E environment would take a few tens of milliseconds. The frightening thing is that the IT people are so used to using such infrastructures that they don’t think there is a problem – they think such abysmal performance is normal!

One reason ASCII is so popular in IT is the use of markup and scripting languages that was triggered in the mid-‘80s by the World Wide Web. A few years ago I read a paper in a refereed journal where the author claimed that the first markup language was created in 1986. How soon they forget. (Perhaps more apropos: How soon they repeat the mistakes of the past.) In fact, markup and scripting languages were very popular in the ’50s and ‘60s. But, by the mid-‘70s they had pretty much disappeared. There was a very good reason why they disappeared. They were great for formatting reports, but they sucked for actual programming because they were slow and very difficult to maintain at the large application level. Show me a buggy web site and I will show you a Javascript website.

OO Development Problems

The OO paradigm held great promise in the ‘80s because it provided a much more direct link between the customer space and the computing space due to problem space abstraction. And, in fact, it has substantially improved the maintainability of software. I once participated in a large project to evaluate OO techniques, where we collected data on everything in sight. At the end of day, the initial development took about the same time as procedural development. However, program maintenance took 1/10th the time and reliability increased by a factor of two.

Sadly, I think that today’s OO development can best be described by paraphrasing G. B. Shaw’s observation on Christianity: The only problem with OO development is that it has never been tried. The vast majority of OO applications today are simply C programs with strong typing. In many respects, they represent the worst of both the OO and procedural approaches. They tend to be bloated and slow, which is an intrinsic risk for OO abstraction, as well as being unreliable and difficult to maintain, like procedural applications.

So what went wrong? One problem is that 3GLs are closely tied to the hardware computational models. That makes procedural development much more intuitive in a computing environment. In the OO paradigm, one strives to express the solution in customer terms rather than computer terms. Nonetheless the OO developer needs to be disciplined so that the design can ultimately be implemented on Turing machines. Developing that discipline is not as intuitive as procedural discipline, so it is much easier for the novice OO developer to go wrong. Structured Programming brought order to the chaos of procedural development that existed prior to 1970. However, Structured Programming was quite simplistic compared to the sophistication of OO design methodologies; other than the notion of functional decomposition, Structured Programming was mostly just a suite of hard-won best practices. IOW, design (OOA and OOD) is far more important to OO development than Structured Design was for procedural development.

Therein lies the second problem. Most OO developers today do not follow any particular design methodology. I once encountered a developer at a social gathering. When he found I was an OO developer, the conversation went largely as follows:

He: We are about to rewrite our 18 MLOC C application in C++.

Me: Wonderful. What methodology are you going to use?

He, giving me a look reserved for Village Idiots: Uh, you know…  objects… C++.

Me: Hmmm. How much training have you had?

He: We had a one-week course in C++.

Me: Hmmm. Does anyone in your shop have OO experience?

He: A couple of guys did some C++ in college projects.

Me: Hmmm. Are you going to have a consultant to mentor you?

He: No, there’s no budget for that.

Me: Good luck, Huck.

That project was doomed to crash and burn; the only question was how deep the crater would be. It amazes me that companies will pay more to train everyone on how to use a new copy machine than they will pay to train software developers in a major development sea change for constructing their core product line.

The third problem is that in the late ‘80s and early ‘90s, acres of procedural programmers converted to OO development. Instead of learning OOA/D, they jumped right into writing OOPL code. It’s just another language, right? That is the worst way to learn OO development because the OOPLs are still 3GLs and they are still married to the hardware computational models. Those converts were desperate to find something familiar about OO development, so they mapped their procedural design principles onto the OOPL code, mainly because that was easy to do in a 3GL. Thus they created the same C programs they usually wrote, but with strong typing. Sadly, those guys are now writing the OO books.

OO Infrastructures

Today every vendor of an infrastructure brands it as “object-oriented”. In fact, the vast majority of infrastructures are – if one is in a very charitable mood – just object-based. If one is in a less than charitable mood, they are just function libraries where closely related functions have been bundled into “objects”. Worse, many such infrastructures actually violate basic OOA/D principles. Classic examples are Microsoft’s NET and Win32 infrastructures. (I often wonder if anyone at Microsoft knows anything about OOA/D.) Thus, the applications that use them are forced to design to them and that trashes the cohesion and encapsulation of those applications. One reason is that such infrastructures are often developed to make things easy for the infrastructure developers, not the infrastructure user.

One example is MVC, the Model-View-Controller. Originally developed as a RAD paradigm for Smalltalk environments, substantial infrastructures have been developed around it in IT for other OOPLs. Like most RAD applications, that works fine if things are very simple, like viewing data in a database in different ways and performing very basic updates on it. However, it breaks down as soon as the application needs to do some serious work and requirements start changing. One reason for that is that the Model, View, and Controller are completely artificial concepts in the developer’s mind, rather than customer space concepts. Worse, they are based on an overall strategy for processing data in the computing space rather than what the customer actually wants done.

That approach violates OOA/D doctrine in two ways. The solution is structured around the computing space problem of managing data rather than the customer’s problem of, say, computing employee benefits. That kind of structural divergence guarantees difficulties during maintenance. OOA/D doctrine says that the solution structure should map as directly as possible to the customers’ space. That’s because customer don’t like change any more than software developers, so they will implement changes in their space in a way that minimizes disruption of their existing structures. If the software structure faithfully mimics the customers’ structures, then when the changes filter down to the software as new requirements, the customer will already have done most of the work. However, if the software structure is based on an artificial organization like MVC, one is doomed to mismatches that will require unnecessary rework to fit the new requirements into the structure.

The second way the MVC violates OOA/D doctrine is the way applications are partitioned. OO applications are partitioned into subsystems that reflect logically distinct problem space subject matters. Since each subject matter has its own unique functionality, this allows requirements changes to be isolated. The larger and more complex the solution is, the more subsystems one will tend to have in the design. In the MVC approach, though, the infrastructure supports only three subject matters, regardless of complexity. In a large application, that can lead to subject matter implementations that are very difficult to maintain just because of their size.

There are a variety of RAD layered models similar to MVC that are commonly implemented and they all suffer the same problems due to their rigid and artificial structure. Most of the keystroke-saving infrastructures in IT are based, at least to some extent, on those layered models.  One of the more common classes of IT infrastructure deals with allowing the application to talk to a relational database. Such infrastructures save a lot of keystrokes for the developer because they capture a lot of the grunt work of DB access.  That’s because RDBs and their access are very narrowly defined so that very generic access tools (e.g., SQL) can be used. However, from an OO perspective, that is a major problem because those infrastructures must be accessed directly from the application code that is actually applying business rules and policies to solving a specific customer problem.

The difficulty is that RDBs are an entirely separate problem domain. Worse, they are a computing space domain, rather than a customer business domain. OOA/D doctrine says that such unique domains need to be isolated and entirely encapsulated behind an interface as a subsystem. The rest of the application that deals with the customer’s problem should have absolutely no knowledge of the RDB. Put another way, the rest of the application does not care if the data is stored in an RDB, an OODB, flat files, or on clay tablets. But as soon as the developer uses those handy infrastructures, that isolation is impossible and the RDB semantics bleeds into the rest of the application.

One can argue, “So what? In IT everybody uses RDBs, so why worry about it?” Well, in my career I have seen five major DB paradigms (flat files, ISAM flat files, CODASYL, RDB, and OODB). Each one caused great gnashing of teeth when mountains of legacy code needed to be upgraded. Maybe RDBs are the endpoint for IT. But look at the amount of legacy code that will have to be changed and tell me that the virtually costless isolation of the DB access in a subsystem is not methodologically good insurance against a modern day Codd showing up with a better idea.

This can be taken to even greater extremes. A couple of vendors provide infrastructures for doing things like queries and joins in an OOPL. IMO, that is a really, really bad idea from an OOA/D perspective. RDBs exist for a single purpose – to provide persistent data storage in a generic fashion. Access through an RDB is generally much slower than through an equivalent flat file system that is optimized for a particular application (that’s why RDBs are about as common in R-T/E as penguins at the Equator). What RDBs provide is reasonable performance for each of many applications accessing the same data for very different reasons. Constructs like queries and joins are uniquely suited to that sort of generic access. IOW, the RDB approach is an optimal solution for the multiuser, arbitrary data access problem.

However, a particular application is solving exactly one, very particular problem for the customer. In that situation, one wants optimal data structures and access techniques that are hand-crafted to the problem in hand. In that context, generic constructs like queries and joins are always going to carry a serious performance penalty compared to a custom design. That’s why one wants to isolate the data storage mechanisms in a single subsystem with a generic interface.

But from an aesthetic OO perspective, the problem is much worse than performance. The relationships that underlie queries and joins are table-based rather than row-based (Table:Row::Class:Object). In OO development, relationships are object-based. That means that one thinks about relationship navigation very differently in an OO context than in an RDB context. That difference will be reflected ubiquitously throughout the entire design because of the way messages are addressed and collections are formed. As soon as you introduce a class-based relationship navigation mechanism, you cease doing OOA/D entirely.

OOP-based Agile Development

First, let me say that the Agile processes institutionalized a number of excellent ideas in software development – pair programming for reviewing the product, incremental development to properly manage and estimate projects, integrating testing with development, forcing the customer to be very specific about requirements, and refactoring 3GL code to make it more maintainable, to name a few. The OOP-based Agile processes like XP, Scrum, and the rest, were the first attempt to focus on a specific implementation of a process for software development. (SQA systems, like the CMM, focused on what a good software process should do, but not how to actually implement such a process.) However, I have three big problems with the OOP-based Agile processes.

The first problem is that they focus too much on testing for reliability. Integrating testing with development was an excellent idea, but it is not the whole solution to the reliability issue. When the OOP-based Agile processes were introduced in the ‘80s, the industry average reliability was a little over 4-Sigma, or about 5K defects per MLOC. That was pretty awful compared to non-software products. The OOP-base Agile processes improved on that significantly because of the emphasis on integrated testing. However, in practice there is a limit on what testing alone can do. (I spent two decades in the electronics test business, so I know a bit about it.) That limit is roughly 5-Sigma, or 232 defects/MLOC. The only way you can get better reliability than 5-Sigma in software development is through militant defect prevention, but the OOP-based Agile processes do not do that at all.

The second problem I have with the OOP-based Agile processes is that they remove all accountability from the developer for reliability, schedule, and productivity. Reliability is defined as passing the customers’ acceptance tests. If the software passes those tests it is, by definition, meeting the customer’s requirements completely. Thus the developer is not responsibility for doing anything else to improve the quality of the product, like defect prevention.

The only schedule responsibility the developer has is to resolve small sets of requirements for a particular IID increment. The overall project schedule is the customers’ responsibility via “steering” the development by defining the next small increment’s requirements (which the developers will negotiate down until they are sure they can be completed within one increment).

Accountability for productivity is also removed from the developers’ shoulders because all the metrics employed (e.g., XP’s Velocity) are defined relative to the developer team itself. The metrics measure how well the team performs from one increment to the next. It is typically impossible to tell if one Agile team is more productive than another Agile team, much less more productive than another non-Agile team, unless both groups produce exactly the same product over all the increments. No wonder developers love the OOP-based Agile processes!

The last and biggest problem I have with the OOP-based Agile processes is that their popularity has set back software development by at least a decade. In the late ‘50s the introduction of 3GLs caused a major paradigm shift in software development by increasing productivity by orders of magnitude. In the ‘90s, the industry was poised for a similar paradigm shift from 3GLs to 4GLs. Unfortunately, the OOP-based Agile processes are militantly focused on writing 3GL code. Attempting to use more abstract model representations is dismissed as “BDUF” – Big Design Up Front. IOW, they want no part of 4GLs.

Today there are several commercial transformation engines that will produce an executable directly from just a UML OOA model using 100% code generation.  Shops that use those engines report order of magnitude gains in productivity and integer factor improvements in reliability. Yet that level of 4GL development is still in the Early Adopter stage because of the mainstream shift towards OOP-based Agile processes. In fact, there is no reason that agile processes cannot be adapted to 4GL development. Bob Martin, a pioneer of XP, has been on record several times saying that a 4GL is essentially just another programming language. Nonetheless the OOP-based Agile developers remain militantly focused on 3GL coding. Worse, most of them also dismiss 4GL development as BDUF.

Software Education

One cannot talk about the problems of the software industry without talking about software education. Years ago I was working on a process improvement activity. The goal was to improve our interviewing techniques for June Grads. We did a typical matrix of qualities we wanted versus interview techniques. Part of the exercise was to rank the desired qualities in order of importance. (We then focused on the interview techniques that yielded the best weighted scores.) The surprising thing that came out of that exercise was that things like coding skills, knowledge of data structures, and knowledge of algorithms ranked very low while things like communications skills, teamwork, and problem solving ranked much higher. IOW, we were interested in things that the CS departments were not teaching.

IME, most CS departments focus too much on details and elegance. They teach students how to code, but not how to design. They teach students LISP and Haskell because they are elegant and interesting while not mentioning that they are useless for commercial applications because they are too slow and/or virtually unmaintainable for large applications. They teach behemoth infrastructures like CORBA when any communications can be handled much more efficiently and reliably with simple messages having by-value data packets. Worst of all, they teach the mechanics software development activities, but not the overall process of software development. IOW, the CS departments are producing code hackers instead of software engineers.

Another of my pet peeves is textbooks where the author delivers the content in Sermon on the Mount mode. The author provides a vast bundle of tehcniques, practices, and guidelines as the proper way to do development without explaining why those are good things to do. They don’t talk about how to apply a few fundamental principles; instead they provide a plethora of cookbook tools derived from those principles. That leaves the student up a creek when a problem where the guidelines don’t quite work is encountered. Nor does it help when different guidelines enable plausible alternatives from which the student must choose. (Shameless plug: in my book I focus so much on explaining how one should think about design that I sometimes deliberately lead the reader to a poor design just to prove that the methodology is self-correcting, so that one will realize it is a poor design by thinking about basic principles.)

A corollary problem is that quite often authors don’t talk about things that they should talk about because they don’t want to confuse their readers with “unnecessary complexity”. A classic example of this occurs in most OOA/D books today. You could count on one hand the number of such books that mention Normal Form. In fact, every Class Diagram should be normalized just like a relational database. That’s because both are firmly grounded in the same set theory where Normal Form is an elegant way to ensure consistency and, more importantly, to avoid ambiguity. Similarly, you could count on one hand the number of OO books that discuss using a variation on design-by-contract (DbC) to resolve state machine interactions correctly. It is tedious, so experienced developers only use it when things are really confusing, but it is still an invaluable tool to ensure correctness in complex situations.

For that matter, try finding a methodology book outside R-T/E that describes how to use finite state machines. Why? The IT community thinks using state machines requires too much skill. It is true that getting state machines to work initially is tricky. But once they do work, they tend to be very reliable and robust in the face of volatile requirements. However, the real reason I think that attitude is, at best, condescending is because it is mired in a view of IT from the ‘60s when financial institutions had acres of entry-level COBOL programmers churning out MIS reports. Today IT is a very different beast. With multitasking, online processing, interoperability, concurrent processing, asynchronous real-time input, ubiquitous networking, and multi-user systems, IT is beginning to look a whole lot like R-T/E. It is time the IT industry moved into the 21st century and accepted that (A) minimal skills sets don’t cut it anymore and (B) you can’t hide complexity behind massive infrastructures without paying the tolls in performance, reliability, and maintainability.