Archive for May 17th, 2016

An interesting thing about having spent over thirty years taking software from one platform to another is that, time and again, I’ve had my understanding of what constitutes correct code challenged. That’s a good thing.

Sadly, many people who ply the trade of software development mistakenly believe that a compiler has the ability to warn you when you’re code is going to behave in ill-advised ways. Worse yet, they fall into the trap of believing either that their code is correct if it compiles without warnings or that if the compiler accepts your code then any compiler will. Unfortunately, these beliefs are the developer equivalent of a two year old’s lack of object permanence. These two tragedies aside, the vast majority of developers are clueless as to how static analysis can and should be used to ensure code quality.

Let’s rewind a bit and work through these.

In the beginning was the language specification. It was a bright, shiny idea given form. Lest you get the idea that these documents, venerated by both compiler authors and language wonks alike, are intrinsically sane; please recall that the original Ada spec allowed minus signs in the middle of numeric literals and that 8 and 9 were perfectly acceptable octal digits in C. Now a computer language without a compiler is fairly dull. Enter the compiler authors. These individuals, who number about 1000 in the world and of whom I’ve personally known about a dozen, are highly proficient at taking the language specification and giving it life. The way they do this is far more Pollock than Vermeer. Why? Well, a language is the embodiment of a worldview. Unlike source code control systems, which are only created when someone gets fed up with the way that the current one they’re using does one particular thing to such an extent that they can’t bear living under its yoke any longer, languages come from the world of paradigms. Why? If your pet peeve is small, you’ll probably be able to either work around it or get it added to the language (typically contingent upon who you know). If your peeve isn’t small, any attempt to modify the language you’re using will have the same result as updating the value of Planck’s constant or dropping a storm trooper platoon into the middle of a city on Vulcan.

It can’t possibly be that bad you say. Actually, it can. And, in fact, it is. A long time ago, Apple was transitioning from it’s Pascal-based OS to a C-based one. The transition was fraught with byte-prefixed, null-terminated strings. The resultant code was pretty horrific. Just recently, I was working on a feature that required me to move between BSTR, wstring, COM and char strings. This because the language wasn’t designed with the notion that strings could be more than just US English. Compare this to Swift which is a pure Unicode language right down to the variable names.

Every compiler writer brings their own unique experiences and skill set to realizing the worldview embodied in the language specification. No two compilers will realize it the same way. Oh, they’ll be close and probably agree 90% of the time. The biggest area of difference will be in what each compiler considers important enough to warn the developer about. On one end of the scale, you could argue that if the language specification allows something that the developer is free to write the code accordingly. At the other, the compiler would report every questionable construct and usage. All compilers that I’m aware of fall somewhere in the middle.

Unfortunately, rather than biting the bullet and forcing developers to recognize their questionable and problematic coding choices, compilers have traditionally allowed code to have a pass by not enforcing warnings as errors and by having the default warning level be something uselessly low. To make matters worse, if you do choose to crank up the warning level to its highest severity, many times, the operating system’s headers will fail to compile. I’ve discovered several missing macro symbols that the compiler just defaulted to 0. Not the expected behavior. Microsoft’s compiler have the sense to aggregate warnings into levels of increasing severity. gcc however, does not. The omnibus -Wall turns out to not actually include every warning. Even -Wextra leaves stuff out. The real fun begins when the code is expected to be compiled on different operating systems. This can be a problem for developers who have only ever worked with one tool chain.

So, let’s say that you realize that compilers on different platforms will focus on different issues. You’ll probably start considering trying other compilers on the same platforms. Once all those different compilers, set to their most severe, pass your code is good right? Not so fast. Remember, a compiler’s job is to translate the code, not analyze it. But, you do peer reviews. Tell me, how much of the code do you look at? How long do you look at it? Do you track all the variable lifetimes? Locks and unlocks? In all the code paths? Across compilation units?

Of course you don’t. That would be impossible. Impossible for a person. That’s why there are static analysis tools. My current favorite is Coverity. And yes, it costs money. If you can sell your software, you can pay for your tools.

But the fun doesn’t end there. Modern compilers can emit additional code to allow for profile guided optimization (PGO). Simply put, instrument the code, run the code, feed the results back into the compiler. Why? Because hand tweaking the branch predictors is an exercise in futility. Additionally, you’ll learn where the code is spending its time. And this matters because? It matters because you can waste a lot of time guessing where you need to optimize.

Finally, there’s dynamic analysis. This is realm of run-time leak detection. Wading through crash dumps and log files is a terrible way to spend your time.

So, are you developing quality software or just hacking?

Read Full Post »

%d bloggers like this: