This case study presents system and software engineering issues relevant to the accidents associated with the Therac-25 medical linear accelerator that occurred between 1985 and 1988. The six accidents caused five deaths and serious injury to several patients. The accidents were system accidents that resulted from complex interactions between hardware components, controlling software and operator functions.
Medical linear accelerators accelerate electrons to create high energy beams that can destroy tumors. Shallow tissue is treated with the accelerated electrons. The electron beam is converted to X-ray photons to reach deeper tissues. Accidents involve delivering an unsafe radiation dose to a patient.
A radiation therapy machine is controlled by software that monitors the machine status, accepts operator input about the radiation treatment to be performed, and initializes the machine to perform this treatment. The software turns the electron beam on in response to an operator command. The software turns the beam off when the treatment is complete, the operator requests beam shutdown, or when the hardware detects a machine malfunction. A radiation therapy machine is a reactive system in which system behavior is state dependent, and system safety depends upon preventing entry to unsafe states. For example, the software controls the equipment that positions the patient and the beam. The positioning operations can take a minute or more to execute. It is unsafe to activate the electron beam while a positioning operation is in process.
In the early 1980’s Atomic Energy of Canada (AECL) developed the Therac-25, a dual-mode (X-rays or electrons) linear accelerator that can deliver photons at 25MeV or electrons at various energy levels. The Therac-25 superseded the Therac-20, 20-MeV dual mode accelerator with a history of successful clinical use. The Therac-20 used a DEC PDP-11 minicomputer for computer control, and featured protective circuits for monitoring the electron beam, plus mechanical interlocks for policing the machine to ensure safe operation. AECL decided to increase the responsibilities of the Therac-25 software for maintaining safety and eliminated most of the hardware safety mechanisms and interlocks. The software, written in PDP-11 assembly language, was partially reused from earlier products in the Therac product. Eleven Therac-25s were installed at the time of the first radiation accident in June 1985.
The use of radiation therapy machines has increased rapidly in the last 25 years. The number of medical radiation machines in the United States in 1985 was approximately 1000. By 2009 the number had increased to approximately 4450. Some of the types of system problems found in the Therac-25 may be present in the medical radiation devices currently in use. References to more recent accidents are included below.
Analysis of Case Study
The Therac-25 accidents and their causes are well documented in materials from the U.S. and Canadian regulatory agencies (i.e., the U.S. Food and Drug Administration (FDA) and the Canadian Bureau of Radiation and Medical Devices) and in depositions associated with lawsuits brought against AECL. An article by Leveson and Turner provides the most comprehensive publically-available description of the accident investigations, the causes of the accidents, and the lessons-learned relevant to developing systems where computers control dangerous devices.
Case Study Description
The Therac-25 accidents are associated with the non-use or misuse of numerous system engineering practices, especially System Verification and Validation, Risk Management, and Assessment and Control. In addition, numerous software engineering practices were not followed, including design reviews, adequate documentation, and comprehensive software unit and integration test.
The possibility of radiation accidents increased when AECL made the system engineering decision to increase the responsibilities of the Therac-25 software for maintaining safety and eliminated most of the hardware safety mechanisms and interlocks. In retrospect the software was not worthy of such trust. In 1983 AECL performed a safety assessment on the Therac-25. The resulting fault tree did include computer failure, but only hardware failures; software failures were not considered in the analysis.
The software was developed by a single individual using PDP-11 assembly language. Little software documentation was produced during development. An AECL response to the FDA indicated the lack of software specifications and of a software test plan. Integrated system testing was employed almost exclusively. Levinson and Turner describe the functions and design of the software and concluded that there were design errors in how concurrent processing was handled. Race conditions resulting from the implementation of multitasking contributed to the accidents.
AECL technical management did not believe that there were any conditions under which the Therac-25 could cause radiation overdoses, and this belief drove the company’s responses to accident reports. The first radiation overdose accident occurred in June 1985 at the Kennestone Regional Oncology Center in Marietta, Georgia, where the Therac-25 had been operating for about 6 months. The patient who suffered the radiation overdose filed suit against the hospital and AECL in October 1985. No AECL investigation of the incident occurred and FDA investigators later found that AECL had no mechanism to follow-up reports of suspected accidents. Other Therac-25 users received no information that an accident had occurred.
Two more accidents occurred in 1985, including a radiation overdose at Yakima Valley Memorial Hospital in Yakima Washington that resulted in a report to AECL. The AECL technical support supervisor responded to the hospital in early 1986: “After careful consideration, we are of the opinion that this damage could not have been produced by any malfunction of the Therac-25 or by any operator error…there have apparently been no other instances of similar damage to this or other patients.”
In early 1986 two accidents at the East Texas Cancer Center in Tyler Texas that resulted in the death of two patients within a few months. On March 21 the first massive radiation overdose occurred; the extent of the overdose was not realized at the time. The Therac-25 was shut down for testing the day after the accident. Two AECL engineers, one from the plant in Canada, spent a day running machine tests but could not reproduce a malfunction code observed by the operator at the time of the accident. The home office engineer explained that it was not possible for the Therac-25 to overdose a patient. The hospital physicist, who supervised the use of the machine, asked AECL if there were any other reports of radiation overexposure. The AECL quality assurance manager told him that AECL knew of no accidents involving Therac-25.
On April 11 the same technician received the same malfunction code when an overdose occurred. Three weeks later the patient died; an autopsy showed acute high-dose radiation injury to the right temporal lobe of the brain and to the brain stem. The hospital physicist was able to reproduce the steps the operator had performed and measured the high radiation dosage delivered. He determined that data-entry speed during editing of the treatment script was the key factor in producing the malfunction code and the overdose. Examination of the portion of the code responsible for the Tyler accidents showed major software design flaws. Levinson and Turner describe in detail how the race condition occurred, in the absence of the hardware interlocks, and caused the overdose. The first report of the Tyler accidents came to the FDA from the Texas Health Department. Shortly thereafter AECL provided a medical device accident report to the FDA discussing the radiation overdoses in Tyler.
On May 2, 1986 the FDA declared the Therac-25 defective and required notification of all customers. AECL was required to submit to the FDA a corrective action plan for correcting the causes of the radiation overdoses. After multiple iterations of the plan to satisfy the FDA, the final corrective action plan was accepted by the FDA in the summer of 1987. The action plan resulted in the distribution of software updates and hardware upgrades that reinstated most of the hardware interlocks that were part of the Therac-20 design. AECL settled the Therac-25 lawsuits filed by patients that were injured and by the families of patients who died from the radiation overdoses. The total compensation has been estimated to be over $150 million.
Leveson and Turner describe the contributing factors to Therac-25 accidents: “We must approach the problems of accidents in complex systems from a systems-engineering point of view and consider all contributing factors (Leveson and Turner 1993). For the Therac-25 accidents, contributing factors included: • Management inadequacies and lack of procedures for following through on all reported incidents, • Overconfidence in the software and the removal of hardware interlocks (making the software into a single point of failure that could lead to an accident), • Less-than-acceptable software engineering practices, and • Unrealistic risk assessments along with over confidence in the results of those assessments.
Recent Medical Radiation Experience
The New York Times published between 2009 and 2011 a excellent series of articles by Bogdanich, a three-time of the Pulitzer Prize for investigative reporting, under the title “Radiation Boom” on the use of medical radiation (New York Times).
The following quotations are excerpted from that series:
“Increasingly complex, computer-controlled devices are fundamentally changing medical radiation, delivering higher doses in less time with greater precision than ever before.” But patients often know little about the harm that can result when safety rules are violated and ever more powerful and technologically complex machines go awry. To better understand those risks, The New York Times examined thousands of pages of public and private records and interviewed physicians, medical physicists, researchers and government regulators. The Times found that while this new technology allows doctors to more accurately attack tumors and reduce certain mistakes, its complexity has created new avenues for error — through software flaws, faulty programming, poor safety procedures or inadequate staffing and training.”
“‘Linear accelerators and treatment planning are enormously more complex than 20 years ago,’ said Dr. Howard I. Amols, chief of clinical physics at Memorial Sloan-Kettering Cancer Center in New York. But hospitals, he said, are often too trusting of the new computer systems and software, relying on them as if they had been tested over time, when in fact they have not.”
“Hospitals complain that manufacturers sometimes release new equipment with software that is poorly designed, contains glitches or lacks fail-safe features, records show. Northwest Medical Physics Equipment in Everett, Wash., had to release seven software patches to fix its image-guided radiation treatments, according to a December 2007 warning letter from the F.D.A. Hospitals reported that the company’s flawed software caused several cancer patients to receive incorrect treatment, government records show.”
Leveson and Turner. 1993. An Investigation of the Therac-25 Accidents. IEEE Computer, July 1993.
New York Times. Articles in the 'Radiation Boom' series. http://topics.nytimes.com/top/news/us/series/radiation_boom.