Despite recognizing the near-impossibility of the task, electronics manufacturers strive to deliver perfect products, with the inevitable result that at some point some critical function will not work as expected. Customers enamored with claims of what the product “does” feel justifiably frustrated when—for some reason—it doesn’t. This was a prominent enough issue when computer systems remained simple hardware platforms (by today’s standards) running software racing along at a screaming 1 MHz constrained by memory, storage and branching capability. But massive increases in the machines’ software and hardware complexity have comparably increased the scope of the challenge.
Finding and correcting software (or hardware, for that matter) misbehavior requires first recreating the erroneous result and then duplicating the steps that produced it. Many years ago, customers of a $300,000 piece of software-driven electronic test equipment began to report that the systems crashed at seemingly random times during normal operation. Although the failures did not generally cause any serious damage, the crashing shut down production lines—at considerable cost—until the system restarted. The software engineer in charge of the project attempted to identify the cause of the problem by gathering as much information as possible about the conditions that preceded the failures. He even visited several customer sites to help in his investigation but to no avail. He simply could not collect enough data to establish a verifiable pattern. One gifted and somewhat obsessive customer resolved to pinpoint the cause. Six months later, he called the software engineer and told him basically, “You do this, and you do this, and you do this, and you do this, and it crashes.” Armed with that information, the engineer found the problem and fixed it in half a day. It turned out to be a hard failure—a single bit error buried deep in the code in a routine that was rarely executed. Without the customer’s diligence, the bug would likely have remained undiagnosed for considerably longer.
Around the same time, another customer called the software engineer to report that the newest release of the test-program development software would hiccup on occasion, deleting most of the file containing the program under development—a significant and costly inconvenience. A lengthy discussion between the engineer and several of his colleagues failed to pin down the conditions that caused the problem. The engineer traveled to the reporting customer’s site the following week to investigate further.
At the customer’s request, the technician who had discovered the problem demonstrated it and dispelled any doubt. One particular key combination triggered that exact symptom. But in all of the tests to which the vendor’s quality team had subjected the software during testing, no one had ever hit on that particular (and somewhat illogical) keystroke combination. Again the engineer chased down and corrected the problem in short order. How many other issues remain undiscovered?
Today’s computer-bound systems run more than 2,000 times faster and boast orders of magnitude more memory, addressing space, and storage than their counterparts of that earlier era. To keep pace, process control and improved test methods have eliminated many once-common hardware defects. Yet still we have products coming out whose batteries catch fire or explode. Software verification techniques can exercise code far more quickly and comprehensively than could their predecessors, revealing many more bugs for removal. But dramatically increased software complexity has compounded the difficulty of finding and fixing the bugs that remain. How do you follow every branch in every module of every piece of sophisticated software? Customers may identify some anomalous behavior, thereby contributing to finding some outstanding problems as in the cases above, but—like it or not—however many problems you find, you can rest assured that some remain hidden from your most diligent efforts.
Bugs will escape detection until someone finds them, discovers their cause and corrects them. The 4th Law of Computing still applies: All software contains bugs. Its corollary also applies: Never buy rev. 1.0 of anything. Many of the latest proposed applications become less attractive when we admit that reality.
Much of the recent buzz surrounds a plethora of “unattended” applications—driverless cars, unattended manufacturing facilities, and the like. Driverless cars have undergone hundreds of thousands of miles of testing and have begun to appear in trials, such as a fledgling experiment with Uber cars in Pittsburgh. Lately, however, a couple of reports have surfaced where a driverless car caused an accident, in one case hitting the side of a bus because the angle of the sun had confused its imaging hardware, and the software had misinterpreted the resulting information. However creative the problem-solving algorithms are that the software invokes, situations will arise that lie outside the software’s problem-solving capabilities, where (you hope) a human driver could have made a better decision.
These potential failures do not represent “deal-breakers.” They do, however, remind us that we should not be discussing if problems will occur but how to respond when they do, especially if they cause economic loss, personal injury or even death. Regulators have to analyze the known episodes to minimize their occurrence. At the same time, users of all computer systems should accept that sooner or later such bugs will manifest. Hoping or pretending that they will not cannot protect you from the consequences.
About the Author
Steve Scheiber is an industry consultant, journalist, author and lecturer. As principal of ConsuLogic Consulting Services, he has spent more than 30 years covering electronics manufacturing and test issues at all levels. Among his projects, he served as senior technical editor for Test & Measurement World as well as editor of Test & Measurement Europe. His textbook, Building a Successful Board-Test Strategy, published by Butterworth-Heinemann, is now in its second edition. And for more than 20 years, Steve has been writing and teaching seminars on economics and cost justification of capital expenditures for engineers and managers. He holds Bachelor's and Master's of Engineering degrees from Rensselaer Polytechnic Institute.