Avoid Concurrency Defects

Accesses to variables shared among multiple threads of execution must be protected via disabling interrupts, using a mutex, or some other rigorously applied concurrency management approach.

Consequences:

Incorrect or lax use of concurrency techniques can be expected to lead to concurrency bugs. Such bugs are usually difficult to detect via testing and difficult to reproduce once detected. A fraction of any such bugs can be reasonably expected to make it past even extensive testing into production fleets.

Accepted Practices:
  • Aggressively minimize the use of globally shared variables. Every variable shared between tasks is a chance for a concurrency defect.
  • For every access to a shared variable, treat the entire time that a copy of the variable is “live” in the computation as a critical section, protecting access via masking interrupts or some other well defined technique.
  • Avoid concurrency management techniques that are “home brewed” or otherwise not part of well proven practice. Similarly, do not modify well known techniques to optimize efficiency or obtain other perceived benefits. Such techniques are extremely difficult to get right, and altering techniques or using non-standard techniques can be reasonably expected to introduce defects.
  • Declare every shared variable “volatile” to ensure that reads and writes do not result in stale data being used due to compiler optimization attempts to improve computation speed.
  • Keep critical sections as short as possible to minimize negative effects on scheduling. (The largest critical section forms a minimum length on blocking time for scheduling purposes).
Discussion:

One aspect of a modern real time embedded system is that it must appear to do many things at once. For example, an engine controller must look at many inputs, set throttle angle and perform diagnostic checks all at the same time. Typically many of these tasks are written as relatively independent pieces of software that must work together, and they must all appear to run at the same time.

In reality there is only one CPU, so tasks must take turns using that CPU, with that turn taking supervised by the RTOS. And, tasks often must share things such as memory locations and input/output devices. The turn-taking means that, unless software designers are extremely careful, on occasion shared resources can be left in undefined or incorrect states, resulting in concurrency bugs.

Example concurrency defect.
Source: http://blogs.windriver.com/engblom/2010/06/true-concurrency-is-truly-different-again.html (That blog post has a nice discussion of how situation-dependent such defects are)

Avoiding and fixing concurrency bugs is a major source of design and testing effort on most embedded systems. In part this is because concurrency bugs can be quite subtle, and in part it is because they can be very difficult to activate during testing as well as difficult to isolate even when one is observed (if they are observed at all during testing). The difficulty of detecting and fixing concurrency defects, as well as the reasonable probability that they won’t be seen at all in testing, makes disciplined use of good practices essential.

A variety of techniques are available to avoid concurrency problems. A preferred approach is avoiding situations in which concurrency bugs are possible. For example, avoiding the use of shared global variables avoids associated concurrency defects (because the shared variables simply aren’t there to begin with). But, when sharing can’t be avoided, there are well defined basic techniques that work. Typically such techniques work by “locking” some resource so that other tasks cannot use it. As an analogy, consider a changing room at a clothing store. If you want to make sure that nobody else tries to use the room you are using, you need to “lock” the room when you enter (with an actual door lock, or maybe just by closing the door or curtain all the way), and then “unlock” the room when you leave. If you never lock the door it might be that nothing bad happens for dozens or hundreds of times you try on clothes. But eventually someone will wander into the room by mistake while you are there if you don’t lock the door.

Disabling interrupts is a concurrency design approach that can be thought of as a program “locking” the CPU so that no other task can use it when a shared variable is being accessed. The way this works is that any task wanting to, say, increment a variable first disables interrupts, then increments the variable, then re-enables interrupts. The period of time between the first read of a variable and when the variable is done being updated is known as a “critical section,” and is the time during which no other task can be permitted to access the variable. Disabling interrupts turns off the hardware’s ability to switch tasks or perform anything but the desired computation during the critical section. This ensures that no other task in the system can read or write the shared variable, because disabling interrupts prevents any other task from running. It is essential that every single access to a shared variable disable interrupts for the entire use of the variable for this to be guaranteed to work. If a local copy of the variable is kept and used outside the time during which interrupts are disabled, there are no guarantees as to how the system will behave when that local copy is subsequently used to update the variable. Other techniques are available to manage concurrency beyond disabling interrupts. But, this is a common technique in embedded systems.

Even using these techniques, special care must be used in accessing any shared resource. For example, the keyword volatile must be used for every shared resource to ensure that the most up to date copy is always accessed. (But, even this won’t help if that copy is updated at an unexpected time.)

Selected Sources:

Ball describes concurrency defects in terms of being race conditions and prescribes disabling interrupts to solve the problem. (Ball 2002, pp. 162).

Douglass gives a pattern for a critical section in section 7.2, and in section 7.2.6 says that the most common way to prevent context switching during a critical section is to disable interrupts. In section 7.2.5, Douglass says: “The designers and programmers must show good discipline in ensuring that every resource access locks the resource before performing any manipulation of the source.” (Douglass 2002)

MISRA recommends that developers “Use Test-and-Set instructions or a signaling mechanism, such as Dekker/Dijkstra/Lamport Semaphores, to protect and mark as ‘in-use’ any common resources.” (MISRA Report 3 p. 26) In more modern terminology, this is a recommendation to use a mutex or related semaphore-based “lock” on data. MISRA also cautions that interrupt enable and disable instructions must be used with care. (id.)

Sullivan presents results of a study of defect types, concluding that 11% of high impact memory corruption errors are due to concurrency defects. (Sullivan 1991, p. 6). This means that while most defects are easier to track down, a few race conditions and other concurrency defects can be expected to happen.

Concurrency defects are so difficult to find that specific testing and analysis tools have been developed to find them, prompting the creation of a benchmark suite to evaluate such tools (Jalbert 2011). A significant challenge to creating such a benchmark is the difficulty in reproducing such bugs even when the exact bug is completely understood. (id., first page)

Park et al. provide a summary of work on finding and fixing concurrency defects (Park 2010). Among other things they note that a concurrency defect was responsible in part for the 2003 Northeastern US electricity blackout that left 10 million people without power (id. p. 245), and that such bugs are difficult to reproduce (id.). 

References:
  • Ball, Embedded Microprocessor Systems: Real World Design, Newnes, 2002.
  • Douglass, B. P., Real-Time Design Patterns: robust scalable architecture for real-time systems, Pearson Education, first printing, September 2002, copyright by Pearson in 2003.
  • Jalbert, RADBench: a concurrency bug benchmark suite, HotPar 2011, pp. 2-8.
  • MISRA, Report 3: Noise, EMC and Real-Time, February 1995.
  • Park, Falcon: fault localization in concurrent programs, ICSE 2010, pp. 245-254.
  • Sullivan & Chillarege, Software defects and their impact on system availability: a study of field failures in operating systems, Fault Tolerant Computing Symposium, 1991, pp 1-9.

Ensure Code Is Modular

Critical embedded software should be modular, and in particular should limit function size to no more than 1-2 pages of code including comments.

Consequences: Poor modularity can reasonably be expected to result in code that is more complex, more brittle, and in general has a higher defect rate than modular code. This is in part because code with poor modularity it is harder to understand (and therefore harder to get right), and in part because it is more difficult to test than code with good modularity.

Accepted Practices:

  • A significant majority of functions, procedures, or subroutines must each be no more than a defined length in the range of 100-225 lines long, nominally 120 lines long, including comments. Longer functions should be rare, and their length should be specifically justified, taking into account whether they have low cyclomatic complexity or other mitigating factors. 
  • Each module should have high cohesion, low coupling, and good information hiding. There are no generally accepted precise values for these metrics.

Discussion:

Code is considered modular if it is written in reasonably small chunks of code (modules) that work together in a way that maintains some key properties. Code that is merely in small chunks but that does not achieve these properties is not considered modular. A code “module” is classically defined as a piece of code that accomplishes a particular function. This may be either a single software subroutine (or equivalent) with associated variables, or may be a small set of collected subroutines that work together to accomplish a function or a set of closely related functions. Key properties of modular code include the following areas.

Small size: each function should be one or two pages of code when printed (up to say 120 lines of code including comments maximum, but with most functions being less than one page of 60-75 lines including comments). Having a function size larger than this makes it difficult for developers and reviewers to understand the entirety of the function at one time, decreasing the effectiveness of finding bugs. Large size also makes it harder to create effective unit tests due to the increased complexity of a large function. (This length limit is historically associated with a function, in terms of a single function fitting on one printed page of about 65 lines.)

High cohesion: each module performs a single job and contains a set of closely related functions rather than a collection of unrelated functions. Having low cohesion results in code that mixes up unrelated functions, making it difficult to test, and difficult to maintain as changes need to be made. Low cohesion also normally leads to bugs due to related functions being scattered around many modules instead of in a single module.

Low coupling: modules should have a low dependency upon other module data. High coupling causes data dependencies among modules that makes testing difficult and tends to lead to subtle data sharing bugs. Good practice is to minimize coupling via avoiding global variables. High coupling can be reasonably expected to increase defect rates.

Information hiding: modules should hide as much about the details of their implementation as possible to reduce implicit dependencies that can cause other parts of the software to fail. For example, the exact algorithm or data format used for calculations internal to a module should not be discernible to any other module. Accepted practice is to use the “static” keyword on identifiers within a .c file to accomplish information hiding, and declare variables within a procedure, instead of globally, if they are only needed within that procedure.

Selected Sources:

Fenton & Neil discuss the topic of software metrics (1999), and say that to really understand the problem you need to consider many factors at once rather than simple metrics. They give an example a combination of problem complexity, use of IEC1508 (draft version of IEC 61508), code complexity, and operational usage among others (Fenton Fig 3, pg. 684) as a way to ensure good code. In other words, it is clear that these types of factors matter, even if there is no single optimal answer.

McConnell presents a survey of function length practices. A length limit of one to two printed pages is common in industry. McConnell says: “If you want to write routines longer than about 200 lines, be careful. (A line is a non-comment nonblank line of source code.)” He also says: “you’re bound to run into an upper limit of understandability as you pass 200 lines of code” (McConnell 1993, pp. 93-94). These comments are primarily in the context of desktop applications rather than safety critical embedded control software. In my opinion a smaller limit than 200 lines of non-comment code is considered acceptable practice for safety critical embedded systems. McConnell, chapter 5, talks more generally about modular programming in terms of information hiding, coupling, cohesion, and function length.


Selby & Basili found that software modules with high coupling and low coherence had 7.0 times as many errors (per lines of non-comment code) as modules with low coupling and high coherence. (Selby et al. 1991, abstract, p. 146).


Withrow found that the optimum size for minimizing defects was 225 lines of code per module, which is a bit larger than some guidelines (Withrow 1990), but still in the same general range being discussed.

The NASA “Power of Ten” rules for writing safety critical code recommend functions of at most 60 lines of text (Holzman 06, rule 4).  They also recommend declaring data objects at the smallest possible level of scope (id., rule 6).

IEC 61508-3 highly recommends a module size limit (p. 93).

MISRA states that modularity requires high cohesion and low coupling (MISRA Report 5, pp. 9-10).

References:
  • Fenton, A critique of software defect prediction models, IEEE Trans. Software Engineering, Sep/Oct 1999, pp. 675-689.
  • IEC 61508, Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems (E/E/PE, or E/E/PES), International Electrotechnical Commission, 1998. Parts 2,3,7. 
  • Holzman, ''The Power of Ten -- Rules for Developing Safety Critical Code,'' IEEE Computer, June 2006, pp. 93-95.
  • McConnell, Code Complete, Microsoft Press, 1993.
  • MISRA, Report 5: Software Metrics, February 1995.
  • Selby & Basili, Analyzing error-prone system structure, IEEE Trans. Software Engineering, Feb 1991, pp. 141-152.
  • Withrow, Error density and size in Ada software, IEEE Software, Jan. 1990, pp. 26-30.





Minimize Use of Global Variables

Critical embedded software should use the minimum practicable variable scope for each variable, and should minimize use of global variables. As one of the chapters in my book says, Global Variables Are Evil.  Over-use of globals can reasonably be expected to result in significantly increased defect rates and the presence of difficult-to-find software defects that are likely to render a system unsafe.

Consequences: Using too many global variables increases software complexity and can be reasonably expected to increase the number of bugs, as well as make the code difficult to maintain properly. Defining variables as globals that could instead be defined as locals can be reasonably expected to significantly increase the risk of the data being improperly used globally, as well as to make it more difficult to track and analyze the few variables that should legitimately be global. In short, excessive use of globals leads to an increased number of software defects.

Accepted Practices:
  • A significant minority of variables (or fewer) should be global. Ideally zero variables should be global. (Special globals such as mathematical constants and configuration information might be excluded from this metric.) The exact number varies with the system, but an expected range would be from less than 1% to perhaps 10% of all statically allocated variables (even this is probably too high), with an extremely strong preference for the lower side of that range Exceeding this range can reasonably be expected to lead to an increase in software defects.
  • The need for each global or category of globals should be specifically justified as required for effective software construction. Speed increases are generally not sufficient justification, nor is limited memory space.
  • In any system with more than one task (including systems that just have a main task plus interrupts), every non-constant global should be declared volatile, and every access to a global should be protected by a form of concurrency management (e.g., disabling interrupts or using a mutex). Failing to do either can normally be expected to result in concurrency defects somewhere in your code.
  • Each variable should be declared locally if possible. Variables used by a collection of functions in the same C programming language module should be declared as top-level “static” within the corresponding .c file to limit visibility to functions declared within that .c file. Variables used by only one function should be declared within that function and thus not be visible to any other function.
  • Global variables must be declared consistently throughout the system, including having exactly the same type information (e.g., if a global is declared as “unsigned” in one place, it needs to be “unsigned” everywhere). 
Discussion:

Global variables (often called “globals” for short) are programming language variables that are visible to the entirety of the program. In contrast, non-global variables (often called local variables, or locals for short), have a reduced scope that makes them visible only in the part of the program where they are used, and invisible to the rest of the program. Excessive use of global variables must be avoided in safety critical software due to the complexity introduced as well as the risk of concurrency hazards.

One reason to avoid globals is that use of many globals can be reasonably expected to lead to high coupling among many disparate portions of a program. Any variable that is globally visible might be read or written from anywhere in the code, increasing program complexity and thus the chance for software defects. While analysis might be performed to determine where globals actually are read and written, doing so for a large number of globals is time consuming, and must be re-done any time any substantive change is made to the code. (For example, it would be expected that such analysis would have to be re-done for each software release.)

Another reason to avoid globals is that they introduce concurrency hazards (see the discussion on concurrency in an upcoming post). Because it is difficult to keep track of what parts of the program are reading and writing a global, safe code must assume that other tasks can access the global and use full concurrency protection each time a global is referenced. This includes both locking access to the global when making a change and declaring the global “volatile” to ensure any changes propagate throughout the software.

Even if concurrency hazards are generally avoided, if more than one place in the program modifies a global it is easy to have unexpected software behavior due to two portions of the program modifying the globals’ value in an unanticipated sequence. This can be reasonably expected to lead to infrequent and subtle (but potentially severe) concurrency defects.

As an analogy, consider if you keep your grocery shopping list on a whiteboard on your fridge. If you live alone and are careful, this may work out fine, and you will never accidentally have an item erased or added in error to your list. But if you live in a busy house (perhaps a dormitory), something you wrote on that whiteboard might be changed or accidentally erased, and you might have no idea that it happened or who did it. In this analogy, everything you write on that whiteboard is a global variable – anyone who enters the house can see it, act upon the information, and potentially modify it. As an extension of this analogy, if that whiteboard is a “to do” list, you are using it to communicate information to others in the house. If that list is modified, corrupted, or overwritten, your communication will fail, with that sort of problem becoming more likely as more and more people share the whiteboard for a diversity of purposes.

Using too many globals can be thought of as the data flow version of spaghetti code. With code “control” flow (conditional “if” statements and the like) that is tangled, it is difficult to follow the flow of control of the software. Similarly, with too many globals it is difficult to follow the flow of data through the program – you get “spaghetti data.” In both cases (tangled data flow and tangled control flow) designers can reasonably be expected that the spaghetti software will have elevated levels of software defects. Excessive use of global variables makes unit testing difficult, because it requires identifying and setting specific values in all the globals referenced by a module that is being unit tested.

In some cases the risk of a global may seem only theoretical rather than practical, if a variable is only used in a single module but happens to be defined as a global. However, this still presents a latent risk of some other piece of the software modifying the variable intentionally (defeating the principle of information hiding), or by accident via a typographical error that was intended to refer to a similarly named variable defined elsewhere. Therefore, it is an important accepted practice to only define variables as global if it is essential to do so.

There are two related classes of globals that are an exception, and are generally considered acceptable for use. One class is constants that represent physical properties (e.g., the number of degrees in a circle being 360), system configuration information (e.g., the fact that a vehicle has a 6 cylinder engine or a 4-speed transmission), and other values that don’t change at run time. These are properly defined as constant values using the programming language keyword “const” in the code or other similar approach. 

Another class of generally acceptable globals is system state information that must be generally accessible across most software, such as engine speed in a vehicle. These system state variables should be written in exactly one place in the code, should not be duplicated (to avoid confusion and the chance for different copies of the variable to get out of synch), should be kept few in number, and ideally should be read via access functions rather than directly to enforce modularity. These are the sorts of globals that should be kept to a minimum and count toward the statement that having a few well-chosen globals might be OK. In distributed embedded systems these globals usually show up as broadcast bus values rather than as actual memory-based global variables.

Entirely eliminating the use of non-constant globals is sometimes unachievable. But when complete elimination isn’t practical, globals should nonetheless be very few (handfuls, not thousands of them) and strategically selected. The globals that are used should be used carefully, and reviewed to ensure each is essential.

Selected Sources:

Wulf & Shaw in 1973 wrote a paper entitled: “Global variable considered harmful,” (Wulf 1973) describing it as the data version of a “goto” statement – which a previous paper by Dijkstra had famously declared “harmful.” After that publication, the concept of avoiding globals has mostly been a matter of clarifying the gray areas and educating new programmers.



McConnell says: “Global data is like a love letter between routines – it might go where you want it to go, or it might get lost in the mail.” (McConnell 1993, pg. 88). McConnell also says: “global-data coupling is undesirable because the connection between routines is neither intimate nor visible. The connection is so easy to miss that you could refer to it as information hiding's evil cousin – ‘information losing.’” (McConnell 1993, pg. 90).


Ganssle advises “Minimize global variables, both to reduce interaction between routines and to reduce the number of places where interrupts will cause problems.” (Ganssle 1992, pg. 186). A few years later, Ganssle and designers in general appreciated that global variables were even more of a problem than previously thought. He then wrote, resorting to humor to make his point: “One of the greatest evils in the universe, an evil in part responsible for global warming, ozone depletion, and male pattern baldness, is the use of global variables.” Humor aside, he goes on to confirm the problems with globals, and gives recommendations. “Every firmware standard – backed up by the rigorous checks of code inspections – must set rules about global use.” He finishes by saying: “I feel that defining a global is such a source of problems that the team leader should approve every one.” (Ganssle 2000, p. 38)

NASA recommends avoiding too many inter-component dependencies: “components should not depend on each other in a complex way,” which includes as a significant factor the use of globals to communicate data between components (NASA 2004, p. 93).

MISRA Software Guidelines recommend against using global variables (MISRA Report 5, p. 10).  IEC 61508-3 highly recommends a modular approach (p. 79), information hiding/encapsulation (id., p. 93), and a fully defined interface (id., p. 93), which sum up to recommending against the use of global variables.

Update: you can find an example of getting rid of a global variable here:
  http://betterembsw.blogspot.com/2013/09/getting-rid-of-global-variables.html

References:
  • Ganssle, J., The Art of Programming Embedded Systems, Academic Press, 1992.
  • Ganssle, J., The Art of Designing Embedded Systems, Newnes, 2000.
  • IEC 61508, Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems (E/E/PE, or E/E/PES), International Electrotechnical Commission, 1998. Parts 2,3,7. 
  • McConnell, Code Complete, Microsoft Press, 1993.
  • MISRA, Development Guidelines for Vehicle Based Software, November 1994 (PDF version 1.1, January 2001).
  • MISRA, Report 5: Software Metrics, February 1995.
  • NASA-GB-8719.13, NASA Software Safety Guidebook, NASA Technical Standard, March 31, 2004.
  • Wulf & Shaw, Global variable considered harmful, SIGPLAN Notices, Feb 1973, pp. 28-34.

Avoid High Cyclomatic Complexity

Cyclomatic complexity is a measure of the number of paths through a particular piece of code (a higher number means the software is more complex). Critical software functions with high cyclomatic complexity are one type of "spaghetti" code that should be avoided. One can argue an exception for single-level switch statements, so long as there are no nested conditionals within the switch cases themselves.

Consequences: A high cyclomatic complexity for a particular function means that the function will be difficult to understand, and more difficult to test. That in turn means functions with a cyclomatic complexity above 10 or 15 can be reasonably expected to have more undetected defects than simpler functions. In practice, “spaghetti code” with a high cyclomatic complexity can reasonably be expected to have a poor design and poor unit test coverage, resulting in the reasonable expectation that it will have an elevated rate of software defects.

Accepted Practices:
  • Cyclomatic complexity should be limited for almost all software functions to a relatively small, predetermined threshold defined by the project software development plan. A range of 10 to 15 for a significant majority of functions in a project is typical.
  • Functions with a cyclomatic complexity above the defined threshold of 10 to 15 should be subject to special scrutiny, and simplified if practical. Functions with a complexity significantly above 15 (perhaps a predefined threshold in the range of 30-50 and above, depending on the particular system involved) should be strictly prohibited.
  • In some cases, functions with a single switch statement having many cases may be acceptable because of the regular structure of a switch statement, but should not contain nested switch statements, nor contain conditional statements within the first-level switch statement.
Discussion:

McCabe Cyclomatic Complexity (e.g., as discussed in NIST 500-235) is a measure of the complexity of a piece of software. Rather than counting the number of lines of code, cyclomatic complexity (roughly speaking) counts the number of decision points within the software. For example, a piece of software with 1000 lines of code but no “if” statements and no loops has a cyclomatic complexity of 1. But a piece of software 100 lines long might have a much higher cyclomatic complexity if it contains many levels of nested “if” statements and loops. Thus, cyclomatic complexity measures how complicated a piece of software is in terms of flow control structure rather than simply how big it is.

An important use of cyclomatic complexity is in understanding how difficult a piece of software will be to test. A common assessment of unit testing thoroughness is the “branch coverage” of a set of tests. Branch coverage measures whether tests that have been run have exercised both sides of a branch. For example, if a piece of code has a single “if … else” construct, attaining 100% branch coverage requires one test case to exercise the “if” part, and another test case to exercise the “else” part, since a program can’t take both paths at once. But, if there are nested conditionals the number of tests required can increase dramatically due to all the different paths that might have to be taken to exercise all paths through the code. Cyclomatic complexity provides a way to understand the number of tests required for these paths. In particular, depending upon the specifics of the software, it can take up to M unit tests to achieve 100% branch coverage, where M is the McCabe Cyclomatic Complexity of the software under test. (The actual number depends upon the structure of the software, but for a higher complexity number one expects that the effort required to test the software will increase.)

Below are examples of program control flow graphs from NIST publication 500-235. The point of these figures is not the particulars of the program, but rather the visible increase in number of possible paths (from top to bottom in each graph) as the cyclomatic complexity increases. Ensuring that tests exercise each and every path through the graph is going to be a lot harder for a complexity of 17 or 28 than for a complexity of 5.


Cyclomatic complexity examples from NIST 500-235, 1996.

A generally accepted practice is to limit cyclomatic complexity for each software function within a system to a range of 10-15 (NIST 500-235 pg. 25). Functions having a complexity above 10 or 15 require special attention and consideration as to why the deviation is required, and can reasonably be expected to be much more difficult to test thoroughly than functions with a complexity of 10 or below. This means that the effort required to achieve high branch coverage during unit test of each function is reduced with low cyclomatic complexity.

Code with a high cyclomatic complexity normally ends up as what is referred to as “spaghetti code,” because the control flow structure is a tangled mess like a plate of spaghetti. When you pull at one end of a piece of spaghetti on a dinner plate, you have no idea which of the other ends will also move. So it is with trying to understand the details of spaghetti software. Such code has the properties of being difficult to understand, difficult to test, and more prone to defects. (Historical note: "spaghetti code" classically referred to code with many goto statements, but more recently the term has come to encompass poorly structured code in general.)

The primary exception to this rule in practice is when a “switch” statement is used to deal with a large number of alternatives in a very regular pattern, especially if the switch is being used to dispatch different command bytes from a communication protocol parser or the like. But, test effort is still significant for a switch statement with many alternative possible paths, and some compilers have trouble with switch statements having more than 256 cases, so care should be taken with large switch statements. In any event, nested conditionals should not be buried inside cases of switch statements.

Selected Sources:

US National Institute of Standards and Technology (NIST) Special Publication 500-235 (Walsh 1996) explains the relationship between cyclomatic complexity and testing effort. Page 25 of that publication discusses a complexity limit of 10 being an accepted practice, and says that while limits as high as 15 have been successful, even higher levels of complexity require additional testing effort.

The Reliability Information Analysis Center (“RIAC” – formerly known as “RAC”) is the focus point of the US Department of Defense for reliability information of all types. They summarize the effects of McCabe Cyclomatic Complexity (MCC) as follows: “Empirically, numbers less than ten imply reasonable structure, numbers higher than 30 are of questionable structure. Very high cyclomatic numbers of more than 50 imply the application cannot be tested, while even higher numbers of more than 75 imply that every change may trigger a ‘bad fix.’. This metric is widely used for Quality Assurance and test planning purposes.” (RAC 1996, p. 124) In context, the term “bad fix” refers in general to a situation in which every attempt to fix a software defect introduces one or more new software defects, making it impossible to remove all defects regardless of how much effort is expended.



Jack Ganssle, a respected expert on embedded systems, makes it clear that “spaghetti” code must be avoided. He writes “A sure sign of disaster is the programmer who doesn’t seem to have a firm grasp of the program’s data and control flow, or who writes spaghetti code.” (Ganssle 1992, pg. 64)

Walsh found that defect rates increased significantly for modules with a McCabe Cyclomatic Complexity value above 10 on an embedded computer system (Walsh 1979, pg. 766).

Scholarly data on code complexity metrics is mixed because of the difficulty of performing controlled studies across different application areas, different programming team skill levels, different languages, and other confounding factors. For example, different projects start “counting” defects at different points in the development process, and defect counts are responsive to whether testing is thorough or not (inadequate testing may not find bugs, but that doesn’t mean the software is truly defect-free). Safety critical system developers tend to be a conservative lot (for good reason), and prefer to have less complexity because it makes it easier to inspect and test systems to ensure low defect rates.

I can also draw on my own personal experience conducting more than 130 embedded software design reviews. In essentially all cases when I see a complexity warning sign, I can find a defect within an hour (usually just minutes) of looking at code. But, more importantly, it is relatively straightforward to make the code less complex and more understandable (and therefore less defect prone) with very little effort in almost every case of high complexity that I have encountered. And, usually, once I show the programming team how to simplify, they agree it is better and start the practice themselves on other code. So, in my own extensive consulting and teaching experience, what I have found is that code with complexity warning signs is indicative of a software team that needs to improve in some significant way to create acceptable software – which is an undesirable state for developers of safety critical systems. When I find code that is simple, understandable, and devoid of unnecessary complexity, I know that I have found programmers who in general know what they are doing.

The developers of safety critical software standards feel the same way. They make it clear that complexity must be kept low in safety critical systems. Simplicity is one of the tools that is essential for creating safe systems. All of MISRA Report 5 is devoted to the topic of Software Metrics. It describes MCC (MISRA Report 5, pp. 37-38, p. 60) and gives a maximum acceptable complexity of 15 for safety critical systems.

The FAA recommends using a complexity metric such as MCC (FAA 2000, pg. J-17, J-23). NASA recommends reducing complexity using MCC and design for modularity via having weak coupling (NASA 2004, p. 95).

There are many possible complexity metrics for code. I have found that cyclomatic complexity is a useful predictor of likely code quality.

References:
  • FAA, System Safety Handbook, Appendix J: Software Safety, Federal Aviation Administration, Dec. 2000
  • Ganssle, J., The Art of Programming Embedded Systems, Academic Press, 1992.
  • MISRA, Report 5: Software Metrics, February 1995.
  • NASA-GB-8719.13, NASA Software Safety Guidebook, NASA Technical Standard, March 31, 2004.
  • RAC, Introduction to Software Reliability: a state of the art review, Reliability Analysis Center, 1996.
  • Watson, A. & McCabe, T., (NIST 500-235), Structured Testing: a testing methodology using the cyclomatic complexity metric, Special Publication 500-235, National Institute of Standards and Technology, Sept. 1996