contents  
previous
VI. Reliability and testing
A. Overview and basic terminology
Presumably careful application of the methodologies for requirements elicitation, analysis and
design have led us to a workable system that we have either implemented or are in the process
of implemeenting. In a perfect world, that would be it, and we could soon deliver a solid software
system to the customer. We all know better.
Undoubtedly, testing and debugging will be required.
Unless there was a major gaff in our earlier development efforts, this should all be in the
nature of tweaking. Actually though, performance requirements are hard to predict, of course. We might
well have failed to meet our goals in this regard, which might force a significant redesign, possibly
involving different off-the-shelf components. To be realistic, if the project is very complex, there
might well be a need to redesigs due to missing functional requirements too. This said, our expectation
is that testing and debugging will probably lead to fairly minor redesigns, if any. (Otherwise,
what was the point of spending so much time on so much careful planning?!) In this case, the debugging
will then mainly be limited to correcting small implemention mistakes. (We hope.)
We will now focus on issues that lead to reliable software systems, particularly various types of testing.
Here is just some of the technical terminology needed in our discussion of Chapter 9 material.
Some of these terms can easily be misused.
- A failure is any event in the system's execution that is
contrary to the explicitly specified behavioral requirements or commonly understood and expected
behavior. (Anything that would or should cause us to say "oops", I suppose.)
-
Failure rate ( = one minus
reliability )
is basically the probability of a failure
under specified parameters concerning the system's usage, and related notions. There are a number
of conventional ways to measure this, appropriate to different environments. An example is
"mean time to failure".
- An error has occurred if the system is in a state that
might result in a failure.
- A fault ( = defect
= bug ) is the cause in the code for an error or potential error.
- Fault detection means what is says. (See debugging and testing.)
- Fault avoidance means working towards minimizing the failure
rate
- Fault tolerence means including sufficient robustness
to allow for recovery from some types of faults.
- Verification is concerned with detecting faults without
actually running the system.
- Correctness debugging means detecting failures of
a functional nature, and then determining and correcting the fault(s) that produced the failure.
- Performance debugging means detecting failures of
a nonfunctional nature, and finding ways to resolve them.
- Testing means systematically exercising the system in a
deliberate effort to produce failures. (For this reason the language "successful test" is rather
misleading. Avoid it. Bragging about a "successful test" might just mean you have a lousy test!)
This is frankly just the start of a lot of jargon connected with reliability issues.
next