Using Software Measurements to Ensure Quality at Speed
This blog post outlines of how Hiller Measurements is (or in some cases, is planning to) implement “Quality at Speed” in the software organization, as well as a description of exactly what that means. Some of these ideas have come from various books over the years, others are my own. I’m not going to attribute all of them individually but let me know if you find something particularly brilliant and I’ll figure out which of the books it came from.
At Hiller, we have a driving statement: “Quality at Speed.” And, to be clear, it’s not just a statement – it’s an ethos. Now, we could spend the rest of this post explaining what exactly “Quality at Speed” means, describing Hiller Measurements Flow Control and discussing its benefits, but fortunately I’m not in marketing.
To define it generally, however, “We do not usurp process, but we expedite it and reduce cycle time across the development cycle.” For software, that means we always want to intelligently and intentionally define our processes such that that each stage has a specific purpose and objective. By identifying those, we can minimize the amount of wasted effort and focus on accomplishing high-value tasks more quickly.
Let’s dive in to discuss the key features of our process that enable this.
Pillars of “Quality at Speed” As It Applies to Software
Let me introduce the term “waterscrumfall.” I like this term. A lot. To be fair, it’s not generally considered a positive thing, and I accept that. But what I’m about to describe works very well for us, and “waterscrumfall” seems to be the most accurate description I can come up with.
I firmly believe in agile development, but there’s a lot of misconception around what agile is and isn’t. Instead of listing those misconceptions, I’ll quote The Agile Manifesto.
The Agile Manifesto
We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value:
Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
That is, while there is value in the items on the right, we value the items on the left more.
© 2001-2019 Agile Manifesto Authors
This declaration may be freely copied in any form, but only in its entirety through this notice.
In our case, there is not only value on the unbolded items on the right in the manifesto, sometimes there are legally required mandates that they exist around these items. Does that make them more valuable? I’m not going to split that hair. What I will do is describe how we try to embrace the items on the left while still honoring those on the right.
The key to our process is understanding that for many projects, there are hard requirements and it is not easy (or sometimes even possible) to add/change/remove them mid-project. As such, we need rigid tools around ensuring requirements are met. Because of these requirements, we also have very specifically defined deliverables. Nearly every project has hundreds of checkboxes at the end to prove the requirements are all satisfied. There’s also a LOT of documentation involved.
The space in the middle, however, is generally ours (the software team’s) to use as needed. We choose to use an iterative approach, that maintains high software quality while incrementally adding or completing functionality. We will review it internally with the project manager, systems engineer, and others who are intimately familiar with the requirements to ensure we are implementing them as needed. When in doubt, we will not hesitate to reach out to the customer directly.
In the contract, we often have provisions that changing requirements will require formal actions such as issuing new bids and a myriad of other requirements, but we also do our best to take the customer’s preferences and concerns in to account. These “soft requirements” that emerge during development are typically easy fixes or changes of direction before significant effort is invested, and because we have a relationship with the customer that is built on trust, there is generally not need to formally redefine the contract. Of course, if the discussion does produce a truly hard requirement, this provides a mechanism and justification to go through that process.
Stay Focused: Don’t Just Buy (or Sell) the Kitchen Sink When all You Need is a Water Fountain
Universal testers generally end up with lots of features that are not used, and they nearly always require some per-project customization anyway. Realistically a true universal tester is a pipe dream. Now, that doesn’t mean there aren’t complex systems with lots of interrelated parts. Sometimes you legitimately do need a kitchen sink. If that’s the case, we’ll help you ensure that it will fit the cutout in your countertop.
Shift Risk Left
Sometimes “risk” isn’t obvious. I was working on a system where I planned on using an established framework. It was freely available, and the licensing agreement allowed for commercial reuse without any issues. No worries there.
Then I started selecting a database. The plan of record was to use mySQL (it’s open source) but fortunately I realized early in the process that it is a GPL license, which means there are limitations in how it can be packaged and/or redistributed. We reviewed the commercial license agreement and discovered it wouldn’t lead to a cost-effective solution in this scenario.
So, I dug some more and settled on PostgreSQL instead. No problems. Because we had been planning on mySQL, however, I decided to run my implementation change (not a requirements change) past the customer and include the other licenses we were using for good measure.
It’s a good thing I did because they had an issue with the free framework that I had settled on. It was a proprietary license agreement that had provisions regarding assumption of responsibility for IP protection that they were not comfortable with.
Because I had brought this up early, we were able to resolve the concern without any problems. We could have ended up in a scenario where we delivered a finished system that didn’t meet the customer’s needs. Now, would it have technically met requirements? Maybe, but I’m not willing to consider passing on a technicality a sufficient result.
Create and/or Use Standard Libraries
Plan for reuse. When all you have to do is assemble highly tested components in a standardized way, it greatly reduces the risk. This also facilitates the “stay focused” pillar because it makes it easy to choose exactly the components you want. That reduces effort (and therefore cost) to create a system. It also allows you to reduce risk even further because you can focus on building and testing the “new” elements first and just know the rest of it will come together.
Software Metrics Help Us Accomplish This
The above is nice, but my original intent when starting this post was to describe how we are ensuring everything I just described is true. So, without further ado:
Defining a Defect
There are four primary types of defects that I’m worried about.
- Variance from specification
- Variance from “what makes sense”
- Metrics falling outside an established range
- Anything that requires refactoring to continue additional iteration
Sometimes the first in in conflict with the second. We’ve all had those situations. Unfortunately, I don’t have an easy answer other than trying to catch it as early in the process as possible.
Regarding the fourth, refactoring is often considered part of iteration. Heck, even I (recently) used to think that was the goal. But it’s really not. We just want to be adding functionality, not going back and redoing what we already did. I consider any refactoring efforts to be correcting a defect.
First, let’s talk about the motivation for the measurements we care about.
- Ability to detect issues early in the process (shift risk left)
- Measuring progress
- Detecting “problem” areas
- Finding requirements errors or incompatibilities
Note that these aren’t really code metrics; they are project metrics. You’re here because you want to be better project managers, right? (All kidding aside, developers should learn to think like PMs. It’s served me incredibly well to do so.)
Let’s define the different mechanisms for finding defects:
- Reviewing design concept
- Reviewing proposed implementation
- Unit Tests
- Code Reviews
- System Tests
- Deployment Tests (state of software after 6 months, for example)
You should hopefully catch the majority of requirements errors or incompatibilities in the first two. I’m not going to discuss that here. The last two (or at least last one) are likely past the scope of what the average developer has visibility to.
Regarding the middle two, the key things to consider as we go along are
- Defects should go down over time (shifting risk left).
- No one module, class or other section of code should have significantly more defects than any other.
Measuring Your Code
If it requires manual analysis, I won’t use it as a driving metric. Not to say that all tests have to be fully automated, but there cannot be any subjective elements. (Note, this is NOT the same as having to manually interpret what the metrics mean. I’m fine with that.)
For example, some things like “Number of key classes relative to number of support classes” make a lot of sense. There are absolutely some good motivations for such an analysis. But the chances of me and my colleague arriving at the same results is nearly impossible. That means either a) I have to personally do the analysis on every single project or b) I have to question the metrics generated. If I have to question the validity of the metric, then it’s not useful. I also don’t want to be the single point of failure for the process flow, so the first option isn’t viable either.
What follows is a list of measurements that I am either tracking or will be in the near future. Some of them I will elaborate on, others I will leave for future discussion.
Things to Consider
- I have listed various VI Analyzer tests at the end. Worth noting, I run VI Analyzer twice in my pipeline. Once, to make sure there are no defects that I refuse to let into the code base (generally things like broken VIs and breakpoints being present), and then a second to do a general code quality analysis. I expect there to be “failures” in the latter but there are quite a few tests in this bucket so I don’t mandate that all of them must pass every time. (Are your wire bends always perfect? Isn’t it nice to be reminded anyway?)
- Number of subVIs/nodes in a given VI or method. At one point, NI had related number of nodes in a VI to Lines of Code in text-based terms. I don’t think anyone (NI included) really liked defining it as 1 node is equivalent to one line of code. That doesn’t mean that the number itself isn’t useful though. While having more, small VIs seems counterintuitive, I’ve found that they are easier to test, easier to debug, and generally drive you towards following the Single Responsibility Principle (SRP).
- Code complexity, but not necessarily on a VI level, as defined by LabVIEW. We need some form of measurement of our system. I’m honestly still working on this one
- Usage of FGVs. The presence of a functional global variable isn’t inherently bad, but if you have a large number it’s probably a code smell. I’m going to define an FGV as any VI with an uninitialized shift register.
- Number of public methods in a class. It’s a good measure of responsibility, and the SRP is very important. When dealing with inheritance, consider including the parent’s public methods as well as any unique methods introduce by the child.
- Total number of methods in a class.
- Distribution of methods in on a class basis relative to the average.
- Nesting level. Too shallow means you aren’t abstracting enough. Too deep means you’re probably not using the concept of “specialization” correctly when defining your child classes. (When done properly, the child is-a more specific form of the parent.) This metric isn’t going to be consistent between organizations if they’re using different frameworks, BUT it should be consistent within a development methodology. We’re trying to use standardized solutions as much as possible, so we want this to be relatively consistent.
- When we start using interfaces in the near future, I’m going to track the number of interfaces implemented by a given class. Let’s be clear, interfaces are amazing. I’m even going to argue they are the best possible way to implement a hardware and measurement abstraction layer (HAL/MAL). I also think they are a (potential) indicator of violating SRP: the whole reason they are fantastic for a HAL/MAL is that you can make a single hardware class expose as many different functions as possible. That means they enable the exact opposite of a single responsibility. Let’s be clear though, this is a hypothesis. For now, I will be tracking it and correlating it with the number of defects. I really hope I’m wrong. Did I mention I love interfaces?
- Number of overrides. A large number means you aren’t considering specialization correctly when defining child classes. This is especially true if the children don’t invoke a “Call parent method.”
- Number of inherited methods. The majority of your methods should come from the parent class.
- Number of methods added. This should be inversely proportional to nesting level.
- I’m going to measure number of calls to classmates relative to external calls. I will ignore anything that’s in vi.lib, but we also don’t generally install things to vi.lib. If you do, you might want to reconsider that exclusion.
- Number of connectors per pane. I think we can agree that huge conn panes are inherently bad.
- Percentage of function-oriented code. I’m going to define this as VIs that aren’t members of a class or in vi.lib. It also includes any clusters (even if in an .lvclass). This should not be 0% and it should not be 100%. My guess is that we’re going to end up at 60-70% but I’m not taking any actions based on this metric yet.
- Number of bugs found per class.
- Number of classes collaborated with and relative number of calls to each. If a given class calls another class significantly more often than all the others, those classes are probably coupled.
- Number of times a class has been reused across projects. The more times, the more I trust it.
Measuring Your Metrics
The following are the ways I’m trying to ensure my monitoring is at least somewhat accurate:
- Number of files thrown away. Wait, didn’t I previously say that refactoring is a defect? Yes, I did. But let’s be honest, you’re not perfect and if it looks like you are, you’re not measuring correctly.
- Ratio of defects reported to number of files. If you aren’t finding defects or getting bug reports, then you’re not looking hard enough. This is also a metric that should be neither 0% nor 100%.
- Number of the general VI Analyzer failures fixed in a given merge request relative to the number of new failures introduced.
A custom action creates “Code Climate” reports to indicate exactly which VI Analyzer tests, in which VIs have improved and which have gotten worse. GitLab then displays them natively in your merge request.
You may have noticed I snuck something in that last screenshot. If you head over to https://gitlab.com/hiller-measurements/ci-tools/code-quality-reporting you can find a LabVIEW CLI operation that I wrote which will generate all of the parameters in a code climate report that GitLab will parse. The code is all there and (hopefully) well documented. If you want to talk about it further, don’t hesitate to reach out! One thing to note… once you choose your tests for a given project, try not to add or remove them from the list. This will cause your numbers to float around a lot and make the data less useful.
You can also see an example of one our my CI pipelines here: https://gitlab.com/hiller-measurements/graph-tools.
The primary tools that I am relying on, other than the one above, are:
- VI Analyzer
- Various GitLab CI pipelines and visualization tools
- The Caraya unit test framework, but anything that outputs JUnit is reasonable.
- A tool (still in development) that calculates many of the metrics defined above.
VI Analyzer Tests
As mentioned, I also have two tiers of VI Analyzer tests. My critical VI Analyzer tests are currently:
- Breakpoints detected
- Broken VI
- Separate Compiled Code marked as off
I’m not going to go down the full list of my “general” tests (see the previously referenced pipeline for an example), but I like the following categories in the shipping test set as a starting point.
- Block Diagram -> Style
- Block Diagram -> Warnings
- Documentation -> User -> Spell Check
- General -> Icon and Connector Pane
There’s also a forum on ni.com devoted strictly to VI Analyzer. It’s a fantastic resource and I’m still finding new tests that I want to pull into my next project.
To be clear, a lot of what I’ve outlined above is still a work in progress. The tests are easy enough to write, but before I start running them, I want a mechanism to track changes over time, and display them in a useful manner. I have a tool in-progress that uses VI Analyzer and some manual inspection to get a lot of other metrics, and aggregates them for display, but it’s not done quite yet. More on that later…