Role Model

By Jennifer Bails

Edmund Clarke surveys his new suburban home in search of a quiet corner to plug in his computer terminal. With three sons underfoot--his youngest child is just a few months old--finding a place to think and work without interruption isn't easy. He decides to set up a makeshift office in the basement, where between diaper changes and playtime he escapes to continue his research where he left off at Harvard weeks earlier. He was recruited from his Ivy League post to join Carnegie Mellon's Computer Science Department. The university even allowed him to forgo teaching for a semester so that he could get settled with his family in Pittsburgh and concentrate on his research.

Clarke's goal for the free months is to put into practice a new theory that could have a far-reaching impact on areas as diverse as healthcare, manufacturing, national security, and entertainment. Specifically, he is trying to develop an algorithm that will detect computer circuit or program errors that could lead to erroneous results.

t's early in the fall of 1982, and the computer revolution is well under way. For a quarter-century, huge and costly "mainframe" computers owned by the government and corporations have predicted the weather, processed tax returns, guided intercontinental missiles, and made space exploration possible.

But thanks to the transistor and silicon chip, computers have been reduced so dramatically in size and price that they are bleeping and blipping their way into American homes, schools, and offices. IBM introduced its first personal computer to run the MS-DOS operating system, and Epson released the first laptop. Time magazine even named the personal computer as "Man of the Year," the first non-human ever chosen, for its promise to transform the way people live and work.

For all their potential, though, computers are only as good as the imperfect designers who build them. Engineers have long searched for design errors in hardware and software by running simulations to check performance or by feeding systems with test data to see whether they behave as intended. Programmers also spend days upon days arduously poring over lines of computer code to hunt for bugs. Such hit-or-miss techniques, known as informal verification, worked reasonably well at the advent of computing.

Yet, as microchip technology advanced by the 1970s to include tens of thousands of transistors, and as software grew in complexity, it was no longer possible to simulate or test all possible behaviors of a system or manually spot every bug. Errors began to result in delays in moving new products to market, the breakdown of critical systems already in use, and expensive replacement of faulty hardware and patching of defective software.

While at Harvard, Clarke and his first graduate student, E. Allen Emerson, conceived of a new way to find and correct these quality problems. It is that approach--called Model Checking--that he brought to the basement of his home in the South Hills of Pittsburgh and then onto the Carnegie Mellon campus in his ongoing effort to make computers more reliable.

The importance of his work can't be overstated.

In 1994, a minor arithmetic bug in the Pentium microprocessor forced Intel Corp. to recall thousands of silicon chips and cost the company an estimated $500 million to fix. The Pentium bug is history's most infamous computer glitch but certainly not the only one. Software bugs are so prevalent and damaging that they cost the U.S. economy an estimated $59.5 billion a year, or about 0.6 percent of the gross domestic product, according to a National Institute of Standards and Technology 2002 study (the most recent data available). The report also found that more than a third of those costs--or nearly $22.2 billion--could be eliminated by improved testing methods to identify and correct bugs sooner and more consistently.

Hidden errors also become a matter of life and death as consumers increasingly rely on computers to control critical functions of their cars, airplanes, medical devices, security protocols, and power plants.

A computer crash on the USS Yorktown several years ago shut down all operations aboard the Navy battleship for almost three hours. Last year, a fatal software problem crashed the computers on the new U.S. stealth fighter, the F-22 Raptor, causing pilots to lose all navigation and communications when they crossed the international dateline.

For Clarke, there was no way he could have known while growing up what would become his life's calling. Born in 1945--the same year as the computer was invented--he still has remnants of a Southern accent and graciousness that derived from his upbringing in the rural town of Smithfield, Va. The son of a nurse and a salesman, he became the first in his working-class family to graduate from college. "My background is very humble," he says.

Clarke clearly remembers his awestruck but frustrating encounters with the first computer to appear on the University of Virginia campus during his undergraduate days there. The crude machine was enclosed behind glass, and users had to line up to submit punched program cards to the computer operator. The job would generate a printout containing final results or, all too often, an abort notice with an attached error log. Sometimes it would take a full day just to get back an output that showed a mistake in the program.

"Maybe that led to my interest in finding errors and bugs," muses Clarke, tugging on the salt-and-pepper wisps of his neatly trimmed beard.

Clarke earned a master's degree in mathematics from Duke University in 1968 and then went on to complete his PhD in computer science at Cornell University, writing his dissertation on highly theoretical aspects of how to prove the correctness of computer programs.

"I enjoyed writing the thesis and doing the research, but I was a little disappointed that the results had no practical application whatsoever," he says.

He returned to Duke for two years and then moved to Harvard, where he and Emerson developed the theoretical underpinnings of Model Checking.

Model Checking is a type of formal verification that examines the logic underlying a design, much like a mathematician uses a proof to figure out whether a theorem is correct. It works like this: First, the design of the chip or software being analyzed is converted into an abstract model. The specifications of the model are next expressed in a special type of notation called temporal logic, a language which describes how the system will behave over time. Then, true to its name, the Model Checker indeed checks every last possible state of the model and determines whether it is consistent with the specifications.

After arriving at Carnegie Mellon, Clarke decides to put this "practical application" to the test. He steals away to his basement office for a few hours every morning to write the computer code for the first Model Checker. Then he begins testing his algorithm with small example problems, trying to see whether it can automatically catch mistakes he knows they contain. Time and again, he gets answers that he thinks are incorrect, suspecting there is a bug in his model checking code. By Christmas, however, Clarke begins to realize the bugs aren't in the Model Checker; they are errors in the examples themselves that he didn't know existed.

"That's when I knew this was going to be a really important idea," he recalls.

Unlike simulation and testing, Model Checking has the benefits of being fully automatic, fast, and exhaustive in its search for errors. Moreover, when a design fails to satisfy a desired specification, the method always identifies a counterexample that reveals the source of the problem and often provides important clues for how to fix it.

But there is one serious drawback in the Model Checker.

As the complexity of a system increases, the number of possible states that must be analyzed multiplies exponentially, overwhelming the Model Checker. This conundrum--called the state explosion problem--seems to limit the applicability of the technique to small designs mainly of interest to academia, not industry.

"This is the problem that I've been struggling with for the past 26 years," says Clarke, now the FORE Systems University Professor of Computer Science and professor of Electrical and Computer Engineering at Carnegie Mellon.

In the fall of 1987, Clarke discusses the state explosion problem during a lecture in which computer science faculty introduce their research to incoming PhD candidates. Among those listening is first-year graduate student Kenneth McMillan.

McMillan also hears a presentation by Randal Bryant, now University Professor and dean of Carnegie Mellon's School of Computer Science, whose research focuses on formal verification of computer hardware. Bryant had invented a method of encoding information symbolically called binary decision diagrams.

Not long after the two lectures, McMillan lies out on The Cut to catch some sunshine, his mind spinning in full gear as Frisbees fly overhead.

"I started wondering if you could use [Bryant's] diagrams to model a system with a lot of components as a way to avoid the state explosion problem," says McMillan, who is now a researcher studying formal verification of computer hardware at Cadence Berkeley Labs in California.

Soon afterward, McMillan contacts Bryant, who shares some of the computer code for implementing his diagrams. Combining that code with information gleaned from Clarke, he cobbles together a prototype Model Checker that has the capacity to analyze larger and more complex systems than ever before.

The symbolic method checking combination could eliminate redundant calculations that occur when managing huge data sets, which means that the number of states that could be analyzed by Model Checking increases from 10,000 to 100 billion-billion.

Clarke and Bryant are impressed. "I often use this story to teach my students never to underestimate their ability to do new and interesting research," says Bryant.

Industry started to take notice. Intel Israel became the first technology firm to experiment with Model Checkers in the early 1990s, and Microsoft, IBM, Motorola, and other companies soon followed suit.

The continued research of Clarke and others in the field has led to further advances for verifying ever bigger and more elaborate circuits--and more recently, software. Today, Model Checkers can evaluate up to 10120 states, a mind-boggling number equal to the numeral one followed by 120 zeroes.

This scalability has made Model Checking the most widely used automated method in the computer industry for detecting and diagnosing bugs. NASA, for instance, has used Model Checking to analyze software in
Mars Pathfinder spacecraft. Electronic design automation companies like Cadence and Mentor Graphics now market formal verification tools that rely on Model Checking to help engineers catch errors. And Microsoft uses Model Checking to ensure that drivers for external devices such as printers and mp3 players interact properly with the Windows operating systems.

It's not unusual to find one of Carnegie Mellon's former graduate students spearheading the efforts. Aarti Gupta (CS'94) now heads the systems analysis and verification department at NEC Laboratories America, Inc. in Princeton, N.J. She and her colleagues conduct research on formal verification techniques based on Model Checking. These are integrated into tools for software developers and computer designers who build digital systems, which control everything from cell phones to digital memory chips. "It's wonderful that we are able to use some of these advanced research in real practice," she says.

Gupta anticipates that demand for fail-safe verification methods will grow, especially as people increasingly put their lives in the hands of embedded software such as "drive-by-wire" cars that replace mechanical braking and steering with computerized controls. Last spring, Clarke taught a mini-course titled "Do You Trust Your Car?" about using Model Checking to help build dependable safety-critical systems--the next frontier for his research.

t's been almost three decades since Clarke holed himself away at home for several months to set an idea in motion. At the time, he couldn't have known the momentous significance of his research.

Last month, the Association for Computing Machinery (ACM) awarded what is often referred to as the Nobel Prize of computing to Clarke and Emerson (who is now at the University of Texas at Austin) for their pioneering work in Model Checking and to a French researcher, Joseph Sifakis, who independently developed a similar method. The 2007 A.M. Turing Award, named for British mathematician Alan M. Turing, carries a $250,000 prize.

Stuart Feldman, president of the ACM and vice president of Engineering for Google Inc., sums up what has become conventional wisdom in his industry. "Without the conceptual breakthrough pioneered by these researchers, we might still be stuck with chips that have many errors and would lack the power and speed of today's equipment," Feldman says. "This is a great example of an industry-transforming technology arising from highly theoretical research."

Clarke, clearly touched by the recognition, says, "I wish my parents could be alive now to see what I've done.Ã"

To celebrate his Turing Award, he treated himself to a new high-speed digital camera, which he likes to take to the nearby Phipps Conservatory and Botanical Gardens during his lunch hour to indulge in his hobby of amateur photography. Sometimes he is accompanied by his wife, Martha, of 40 years, who coordinates graduate admissions for Carnegie Mellon's School of Computer Science. Their youngest son, Jeffrey, graduated from the Indiana University School of Medicine this year, and their two older sons have earned PhDs: James in physical chemistry from Harvard University and Jonathan in finance from the University of Pittsburgh. Clarke speaks proudly of the three more "Dr. Clarkes," calling them his biggest achievements of all.

Winning the most prestigious award in computer science still doesn't mean Clarke has all the answers. The struggle to keep verification techniques in pace with the ever-growing complexity of computers is an uphill battle, he says. "As they make more complicated software and hardware, it just exacerbates the state explosion problem, and we are always playing catch-up. In some sense, we'll never solve that problem, but it doesn't cease to generate good new research."

So at 63, Clarke has no plans to end his pursuit: "I would say that the experience of building something and then sticking with the problem has been even more rewarding than any of the subsequent awards that we received, including the Turing Award."

It's a lesson Clarke learned from Allen Newell--a Carnegie Mellon computer scientist and another Turing Award winner, too. In Newell's farewell speech to the university in 1992, one of the pioneers of artificial intelligence advised researchers to pursue a single scientific question that they care about deeply for the duration of their working lives. At the time, Clarke had been plugging away at Model Checking for eight years, so Newell's maxim rang true.

"People who jump from one problem to another, or always try to tackle the hottest new thing, they never seem to do anything really important," Clarke says. "You've got to decide on something and devote your life to it, and if you're lucky, then maybe someday you'll be recognized for it. I firmly believe that."

Jennifer Bails is a freelance writer and former award-winning newspaper reporter. Her work appears regularly in this magazine.

Role Model

Explore