Feeds:
Posts
Comments

Posts Tagged ‘static analysis’

An interesting thing about having spent over thirty years taking software from one platform to another is that, time and again, I’ve had my understanding of what constitutes correct code challenged. That’s a good thing.

Sadly, many people who ply the trade of software development mistakenly believe that a compiler has the ability to warn you when you’re code is going to behave in ill-advised ways. Worse yet, they fall into the trap of believing either that their code is correct if it compiles without warnings or that if the compiler accepts your code then any compiler will. Unfortunately, these beliefs are the developer equivalent of a two year old’s lack of object permanence. These two tragedies aside, the vast majority of developers are clueless as to how static analysis can and should be used to ensure code quality.

Let’s rewind a bit and work through these.

In the beginning was the language specification. It was a bright, shiny idea given form. Lest you get the idea that these documents, venerated by both compiler authors and language wonks alike, are intrinsically sane; please recall that the original Ada spec allowed minus signs in the middle of numeric literals and that 8 and 9 were perfectly acceptable octal digits in C. Now a computer language without a compiler is fairly dull. Enter the compiler authors. These individuals, who number about 1000 in the world and of whom I’ve personally known about a dozen, are highly proficient at taking the language specification and giving it life. The way they do this is far more Pollock than Vermeer. Why? Well, a language is the embodiment of a worldview. Unlike source code control systems, which are only created when someone gets fed up with the way that the current one they’re using does one particular thing to such an extent that they can’t bear living under its yoke any longer, languages come from the world of paradigms. Why? If your pet peeve is small, you’ll probably be able to either work around it or get it added to the language (typically contingent upon who you know). If your peeve isn’t small, any attempt to modify the language you’re using will have the same result as updating the value of Planck’s constant or dropping a storm trooper platoon into the middle of a city on Vulcan.

It can’t possibly be that bad you say. Actually, it can. And, in fact, it is. A long time ago, Apple was transitioning from it’s Pascal-based OS to a C-based one. The transition was fraught with byte-prefixed, null-terminated strings. The resultant code was pretty horrific. Just recently, I was working on a feature that required me to move between BSTR, wstring, COM and char strings. This because the language wasn’t designed with the notion that strings could be more than just US English. Compare this to Swift which is a pure Unicode language right down to the variable names.

Every compiler writer brings their own unique experiences and skill set to realizing the worldview embodied in the language specification. No two compilers will realize it the same way. Oh, they’ll be close and probably agree 90% of the time. The biggest area of difference will be in what each compiler considers important enough to warn the developer about. On one end of the scale, you could argue that if the language specification allows something that the developer is free to write the code accordingly. At the other, the compiler would report every questionable construct and usage. All compilers that I’m aware of fall somewhere in the middle.

Unfortunately, rather than biting the bullet and forcing developers to recognize their questionable and problematic coding choices, compilers have traditionally allowed code to have a pass by not enforcing warnings as errors and by having the default warning level be something uselessly low. To make matters worse, if you do choose to crank up the warning level to its highest severity, many times, the operating system’s headers will fail to compile. I’ve discovered several missing macro symbols that the compiler just defaulted to 0. Not the expected behavior. Microsoft’s compiler have the sense to aggregate warnings into levels of increasing severity. gcc however, does not. The omnibus -Wall turns out to not actually include every warning. Even -Wextra leaves stuff out. The real fun begins when the code is expected to be compiled on different operating systems. This can be a problem for developers who have only ever worked with one tool chain.

So, let’s say that you realize that compilers on different platforms will focus on different issues. You’ll probably start considering trying other compilers on the same platforms. Once all those different compilers, set to their most severe, pass your code is good right? Not so fast. Remember, a compiler’s job is to translate the code, not analyze it. But, you do peer reviews. Tell me, how much of the code do you look at? How long do you look at it? Do you track all the variable lifetimes? Locks and unlocks? In all the code paths? Across compilation units?

Of course you don’t. That would be impossible. Impossible for a person. That’s why there are static analysis tools. My current favorite is Coverity. And yes, it costs money. If you can sell your software, you can pay for your tools.

But the fun doesn’t end there. Modern compilers can emit additional code to allow for profile guided optimization (PGO). Simply put, instrument the code, run the code, feed the results back into the compiler. Why? Because hand tweaking the branch predictors is an exercise in futility. Additionally, you’ll learn where the code is spending its time. And this matters because? It matters because you can waste a lot of time guessing where you need to optimize.

Finally, there’s dynamic analysis. This is realm of run-time leak detection. Wading through crash dumps and log files is a terrible way to spend your time.

So, are you developing quality software or just hacking?

Read Full Post »

I’m a big fan of notebooks. Whoever has the unenviable task of sorting through my stuff after I’ve made that final blog entry will be bored to tears with the piles of notebooks they’ll have to go through, assuming that they don’t simply rent a dumpster and chuck the whole pile. Not my problem though.

To be fair, the consistent use of notebooks to record a project’s designs, progress and failures does not make you a da Vinci. The world is replete with pen wielding lunatics. That being said, I believe that using notebooks (actual paper ones) helps give shape to nascent ideas. The simple act of putting pen (or super high-tech stylus) to paper (be it college rule loose leaf or archival quality) forces you to think a bit more about what you’re doing. Perhaps this is why I’ve never been much of a fan of those who believe that they can create a complex piece of software without some form of design. I’m not talking space shuttle here. A simple data flow diagram, threat model and use case statement is all I ask (the basics). These are the bare minimum.

Whiteboards a all well and fine, but unless you’ve got a dedicated scribe hanging about, what happens on the whiteboard tends to stay on the whiteboard. Besides as a clever person once said, “I couldn’t reduce it to the freshman level. That means we don’t really understand it.

At work, my notebooks are a mix of the mundane and the technical. I track meetings, conversations, designs and random thoughts. I find this helpful when trying to make sense of all the balls in the air and occasional ones that end up on the ground. At the end of each year, I participate in the great year-end summary exercise. For this my notebooks are invaluable. This year was no exception. What was exceptional was the numbers.

  • 193 code reviews (other people’s)
  • 81 user stories / bugs (evaluation, implementation, review)
  • 24 technical documents (papers, presentations and the like)

Once I’d finished doing my sums, I went back to my 2014 summary. Not even close. I’m not really sure why.

What did I do in 2015?

I spent a lot of time in some very specific areas:

  • writing security code
  • developing threat models
  • teaching Visual Studio to Linux developers
  • improving code quality through use of static analysis and compilers
  • coordinating open source and third-party library use

What did I learn in 2015?

  • developers expect a language to work based on the platform they learned on and not the spec (which no one ever reads)
  • people really, really don’t understand the difference between authentication and authorization
  • developers don’t see security as their problem
  • developers don’t trust static analysis (The code has never failed, so why is that change important?)
  • talking about cyclomatic complexity metrics is like trying to explain Klingon opera to a Star Wars fan

What’s up for 2016?

Who can say. Security doesn’t look like a bad guess. And it really doesn’t matter how fast machines get or how much storage they have. Programs will grow to consume all available resources.

The only certainty is that I’ll be filling another set of notebooks.

Read Full Post »

Lately, we seem to be a bit over-exuberant in our desire to point fingers while running about in circles yelling “Look! Look! [insert major firm’s name here] screwed the pooch big time!”

While I believe it important to identify and correct security issues in as expedient fashion as possible, the endless echo chamber adds nothing of value.

I’ve read the code at the center of Apple’s recent SSL security issue. Yes, it’s bad. What it isn’t is unexpected.

I’m sure the old “goto’s are the source of all kinds of badness, from security holes to acne” crowd are probably have their pitchforks and torches at the ready. I, however, will not be joining them for this outing.

Hopefully, the thoughtful reader has already:

  • reviewed the code in question
  • is aware of the difference between -Wall and -Weverything
  • has read Lyon’s commentary
  • uses static analysis tools as an adjunct to [not a replacement for] code reviews
  • realizes why it is so important to fail first
  • knows that this type of thing isn’t going to change any time soon

You may be asking yourself, “What is this failing first of which I speak?”

Simply put, every function/method/routine should fail first and fail fast. Far too many people are out there writing “happy path” code. This is to say their code properly handles situations where everything goes to plan. Hence, the happy bit.

This kind of code isn’t really interesting code in my opinion. Ever since my classmates in college started showing me their programs, I’ve taken great pleasure in poking holes in them. My master’s work in computer viruses was an exercise in attempting to create a design that was intended to fend off attacks. Working at GE Space and later on Visual SourceSafe gave me an appreciation of systems that did not forgive failure. As a result, when I look at a problem, I think first about what will go wrong. Not what could, but what will. As was once said, “constants aren’t and variables won’t.” Or as Fox Mulder was wont to say, “trust no one.” Parameters will be invalid, globals will change while you’re trying to use them and other routines you call won’t do what you need them to do. This includes system routines. In the time since I began using *nix systems around 1980, I’ve seen attempts to set the time crash the system, attempts to set print output size return snarky commentary, and countless weird responses to out of memory conditions. To make matters worse, many people are still fixating on making their development environment behave as though we still lived in a world where the VT100 was the newest tool in the box. We don’t want to refactor the code because the re-validation effort would be outrageous.  We write our code to the least common denominator.

So, what’s a developer to do? Here is my hit list. It’s not exhaustive. It won’t end world hunger. It is entirely my opinion.

Validate All Parameters

Not some, not just pointers, not just the easy ones, but all of them.

Fail as Soon as You Fail

Don’t create a variable and continuously retest it as you go down the routine, get to the bottom, unwind and return.

Fail First

Since things can go wrong, test for them first. Failure code tends to be small, success code large. When you put the failure case after the success, it greatly reduces its connectedness to the related if statement. When you put it up front, you can immediately see what’s supposed to happen when things go south. Additionally, since we fail first, this can be done before you actually put in ‘operational’ code. This allows you to put together a framework for the code faster.

Return Something Meaningful

There are very few instances where you should create a routine that doesn’t return either a boolean or an enumeration of state. Use an error context when necessary. Don’t use a global. Ever. Ever ever. errno has got to be one of the worst ideas anyone ever had. Although errno shifted up 8 bits is right up there.

Initialize All Local Variables

The compiler may indeed initialize them for you. If you’ve got a smart compiler (they really are good these days), then this won’t cost you anything. Static analyzers will tell you to do this.

Have a Single Exit Point

Please don’t whine about how it makes your code look. You’re probably going to have other issues if you don’t. Things like memory and lock management in languages like C and C++. The future generations who are trying to debug around your routine will thank you for not making them dig through the chicken entrails to figure out how they got out of the routine.

Scope Your Variables

Having all your locals at the top of the routine because they’re easier to find says something. It’s not a good something. Did I mention that compilers were really good? Having variables in the smallest possible scope help them. Create a scope if you need to, there is no penalty from the code police for using braces

Scope Your Clauses

This includes if‘s, case‘s, for‘s and anything else that can have more than one statement. This would have made the Apple bug much more obvious.

Use the Appropriate Flow Control Structures

We can see that had Apple used else if’s instead of a series of separate if’s the code would have shown the dependent sequencing and made the goto’s unnecessary. There is a time and place for goto’s, but when they are used to such an extent as in the Apple case, it indicates structural issues. There are multiple control mechanisms for a reason. Using elaborate variable manipulation when you could use a break boggles the mind.

Stop Pretending You’re Smarter than the Compiler

If you’re not keeping up with what Intel, the gcc or the clang folks are doing with compilers or Intel is doing with profiling feedback; then don’t even try to pretend that you can provide more help to the compile than it can figure out on its own.

Use a Static Analyzer

Static Analyzers have way more patience for looking at code lifetimes than a person ever will. They don’t have attitude and will readily call your baby ugly. We need that. We have a whole boatload of ugly babies.

Make Code Reviews a Required Part of the Development Process

This may be one of the hardest things to do. It’s that ugly baby thing. It’s “the code” and not “our code.” I have yet to meet a developer who would disagree that they desire to produce the best product they can. Code reviews should be about objective measures and not bike sheds.

Test it Until it Falls Over

Good developers and good testers are like orchids. Unless you treat them well, they aren’t around very long. A good tester is a good developers best friend. A bad developer will drive off a good tester. A bad tester can kill a project. A good tester will understand the product and it’s use. They also are really sadistic people who take great pleasure in tormenting the product. What they do is find the sharp bits that are protruding and point them out. The thing about developing software is that after a while you stop thinking about what it might do and only think about what it does do. I can’t count how many times I’ve heard a developer ask, “Why would you do that?”

Be Realistic

If you don’t allow for the review feedback loop, you will suffer. It takes time to make corrections. Done is only done when you’ve exercised your error paths. If you don’t do failure testing, you will suffer. If you believe that once the happy path passes that you are ready to release, you will suffer. Notice the pattern?

Read Full Post »

%d bloggers like this: