System Hardware Assurance
Lead Author: Elizabeth McDaniel
This article describes the discipline of hardware assurance, especially as it relates to systems engineering. It is part of the SE and Quality Attributes Knowledge Area.
System Security Hardware Assurance is a set of system security engineering activities undertaken to improve the confidence that electronics function as intended and only as intended throughout their life cycle, and to manage identified risks. The term hardware refers to the microelectronic components, sometimes called integrated circuits, and other electronic components. These hardware components, the products of complex processes involving multiple stages of design, manufacturing, and post-manufacturing, must properly functioning under all circumstances. Here the term assurance refers to the set of activities undertaken to improve the confidence that the hardware functions as intended, and only as intended. Hardware components alone, and when integrated into subcomponents, subsystems, and systems, may have weaknesses and vulnerabilities that offer avenues for exploitation. Hardware risks can be differentiated as weaknesses - flaws, bugs, or errors in design, architecture, code, or implementation, and vulnerabilities that are exploitable weaknesses in the context of use.
Hardware assurance approaches are designed to lessen these risks. Consequences of risk that are not mitigated may include adversary exploitation and subversion of system functionality, counterfeit production of components, and loss of technology advantage for the military and private sector organizations.
The objective of hardware assurance is to prevent loss, damage, and other compromises to the intended functionality of the components themselves and when they become integral to subsystems and systems. Depending on identified concerns and desired mitigations, hardware assurance offers a wide range of combinations of possible activities and processes. At the component level, hardware assurance focuses on the hardware itself, when and where it was designed and manufactured, and its supply chain to delivery. Hardware assurance also focuses on hardware components when they are integrated and connected to subcomponents, components, subsystems, and systems because when components are connected to other components executing software and firmware and operating as part of systems the assurance of their security is essential.
The drivers of hardware assurance are growing concerns over the confidentiality, integrity, and availability of individual integrated circuits and their interconnections into circuits. These concerns have grown with the increasing sophistication and complexity of hardware architectures, integrated circuits, operating systems and application software, with considerations of supply chain risks, emergence of new attack surfaces, and reliance on globalized sources for some components and technologies. The root of trust of a system is typically contained in the processes, steps, and layers of hardware components and across the systems engineering development cycle. Consequences associated with hardware attacks may occur very high in the system in which the components are embedded. The consequences of hardware vulnerabilities have direct analogs to the consequences of cyber and software-based vulnerabilities that are well reported. Fundamentally, hardware assurance focuses on the hardware component itself as well as the component’s interconnections with software and firmware, and other components in order to have full assurance at the system level.
Hardware assurance activities can be applied to components during their lifecycle to reduce the likelihood of risks to proper function or other potential compromises of the hardware. It is likely that new tools and techniques for hardware assurance will be developed in the future to strengthen designs, and new methods will enhance assurance during complex manufacturing, packaging and test, and deployment. The more critical the component in the context of its use and the more critical the system in which it is used dictate the necessary degree of hardware assurance to address the possible operational risks. It is essential to assess risks associated with hardware components in all phases of the component and system’s life cycle periodically.
Life Cycle Concerns of Hardware Components
Aspects of hardware assurance may be applied at various stages of a component’s life cycle (See Figure 1) that extends from hardware concept development and design processes, through manufacturing and associated processes, testing and distribution channels, and finally throughout the use in the larger electronic system. The need for hardware assurance continues throughout its operational life including sustainment and disposal. The complexity of electronic components increases as semiconductor technology advances; therefore the need to “bake-in” assurance increases. Risks can be created during design, and it may not be possible to mitigate all of them externally during operation. Risks can also increase as components are incorporated into systems where interconnections among chips may provide new avenues of concern. Improving hardware assurance posture as early as possible in the life cycle also reduces cost and schedule impacts to “fix” components later in the life cycle. A generalized and conceptualized overview of the typical hardware life cycle (Figure 1) illustrates the phases of the life cycle of components, as well as the subsystems and systems in which they operate. In each phase multiple parties and processes are involved, thereby contributing to a very large set of variables and corresponding attack surfaces. At every stage the potential exists for compromise of the hardware as well as subcomponents and systems in which they operate; therefore, matching mitigations must be identified and applied. Overview of Hardware Life Cycle
The value of the hardware component increases at each stage of the life cycle, so it is important to identify and mitigate weaknesses to address assurance as early as possible. In addition to cost savings, early correction and mitigation avoids delays in creating an operational system. It typically takes longer to find and fix defects later, and this can greatly add in complexity to replace hardware with “corrected” designs.
Hardware assurance during sustainment is also a novel challenge given legacy hardware and designs with their associated supply chains and acquisition. In long-lived high reliability systems, hardware assurance issues are compounded by obsolescence and diminished sourcing. The risks of counterfeits and acquisitions from the gray market are among the concerns.
Function as Intended and Only as Intended
Exhaustive testing can be used to check system functions against specifications and expectations, but checking for unintended functions is problematic. Consumers of products have a reasonable expectation that a purchased product will perform as advertised/indicated and function properly(safely and securely, under specified conditions), but they rarely consider if any additional functions are built into the product. For example, a laptop with a face-to-face web-conferencing capability comes with a webcam that will function properly when enabled. But what if it functions when supposedly turned off; thereby violating expectations of privacy? Given that a state-of-art semiconductor die might have billions of transistors, it is theoretically possible that “hidden” functions might be exploited by adversaries. The statement “function as intended and only intended” communicates the concept.
Hardware assurance typically involve stages and layers of activities in a manner similar to cybersecurity. Hardware specifications and information in the design phase are needed so components can be later validated to perform function properly for systems or missions. If an engineer creates specifications that support assurance that flow down the system development process, the concept of function as intended can be validated for the system and mission through accepted verification and validation processes. Function only as intended is a consequence of capturing the requirements/specifications so the product is designed and developed without extra functionality. For example, an FPGA contains much programmable functionality to perform in a highly flexible manner; however, the programmable circuitry might be susceptible to exploitation by knowledgeable persons. Given specifications of a hardware component, select tools and processes can be used to determine that the component’s performance meets specifications, with a high degree of confidence. Research efforts are underway to develop robust methods to validate that a component does not have capabilities that threaten assurance and are not specified in the original design. While select tools and processes can test for known weaknesses, operational vulnerabilities and deviations from expected performance characteristics/behavior, all states of possible anomalous behavior cannot be determined or predicted. It is possible to determine the presence of known weaknesses/vulnerabilities and to document recommended levels of monitoring, diagnostics and mitigations with some degree/percentage of coverage or confidence level that there are likely no unintended functions in the hardware component. Three entities can test the hardware component and provide data for such assurance consideration: the designer, developer, and provider community can provide a suitably complete description of design, fabrication data and meta-data, verification-and-validation, and can acquire data in design and manufacture. When the provider collects such data s/he can create an assurance case for the acquirer: the acquirer, consumer, and user community can conduct acceptance testing, both in static and dynamic situations in a real or simulated operational environments (much like software-testing and operational testing of systems); and the provider and/or acquirer can solicit 3rd party test/evaluation to collect independent data that there no unintended functionality was found in the component. Ideally more than one source of data informs this assurance/confidence level.
Risks to Hardware
Modern systems depend on complex microelectronics but advances in hardware without attention to associated risks can expose critical systems, their information, and the people who rely on them. “Hardware is evolving rapidly, thus creating fundamentally new attack surfaces, many of which will never be entirely secured.” In this way hardware assurance mirrors cybersecurity because both require mitigations and strategies that evolve as threats do so. Hardware assurance methods seek to raise confidence in the hardware to mitigate known or expected weaknesses or vulnerabilities. Most hardware components are commercially designed, manufactured, and then inserted into larger assemblies by multi-national companies with global supply chains. Understanding the provenance and participants in complex global supply chains of components is fundamental to assessing risks associated with the components. Operational risks that derive from unintentional or intentional features are differentiated based on the source of the feature. Three basic operational risk areas relate to goods, products, or items: failure to follow meet quality standards, maliciously tainted goods, and counterfeit hardware. Counterfeits are usually offered as legitimate products, but they are not. They may be refurbished items, mock items made to appear as the originals, re-marked products, or the product of overproduction/substandard production items that the legitimate producer did not intend to go on the market. The impact of counterfeits include …. Failure to follow quality standards, that include safety and security standards, especially in design, can result in unintentional features or flaws being inadvertently introduced through mistakes, omissions, or lack of understanding about features that might be manipulated by future users for their nefarious purposes. Features introduced intentionally into hardware for specific purposes make them susceptible to espionage or control of the hardware at some point in its life cycle.
Improve the Confidence
One of the key technical challenges associated with hardware assurance is the development of quantifiable metrics and measurements for concepts such as trust and assurance. While quantification is challenging because of the complex interplay between human designers, manufacturing and supply chains, and adversarial intent, it is important so that hardware risks can be identified and managed within program budgets and timeframes. Quantification enables a determination of the required level of hardware assurance, and whether it is successfully achieved throughout the hardware’s lifecycle.
Quantification of hardware assurance begins with a system-level assessment and ranking of hardware by risks and consequences. Criteria for conducting the hardware risk assessment can be based on factors such as criticality of the hardware to system operation or consequence of technology loss by reverse engineering and intellectual property theft.
Current methods for quantifying hardware risk, trust, and assurance emerged from quality and reliability engineering, which rely on methods like Failure Mode and Effects Analysis (FMEA). FMEA, semi-quantitative in nature, relies on a combination of probabilistic data for hardware failure and input from subject matter experts. Adapting FMEA to quantify hardware assurance is hampered when assigning probabilities to human behaviors motivated by economic incentives, malicious intent, etc. Opinions of experts vary when assigning numeric values and weighting factors used in generating risk matrices and scores; consensus processes can help but are not always perfect.
The selection of specific components for use in subsystems and systems should be the outcome of performance-risk-cost-benefit trade-off assessments in their intended context of use. The goal of risk management and mitigation planning is to select mitigations with the best overall operational risk reduction and the lowest cost impact. During the life cycle of a system - architecture, design, code, or implementation - various types of problems can pose risks to the operational functionality of the hardware components provided. These include weaknesses (or defects) that are inadvertent (unintentional), counterfeits that are accidental (unintentional) or intentional, e.g., for financial motivations and/or malicious components designed to change functionality(intentional). The purpose of managing risk in the context of hardware assurance is to decrease the risk of weaknesses that can be exploited and increased in the attack surface, while increasing confidence that an implementation resists exploitation. Ideally, risk management eliminates risk and maximizes assurance to an acceptable level. Often, risks are considered in the context of likelihood of consequences and the costs and effectiveness of mitigations. However, new operationally impactful risks are recognized continuously over the hardware life cycle and supply chains of components. Further, hardware weaknesses are often exploited through software or firmware. As such, to maximize assurance and minimize operationally impactful risks, mitigation in depth across all constituent components must be considered. An example of a mitigation to a hardware weakness is the use of programmable logic. Through programmable logic, when a new attack surface is identified, a new configuration for the programmed logic can be loaded to protect the hardware through configurability and adaptability. In this case, the programming functions must be assured such that they cannot be exploited for unintended purposes. In this case, a dynamic risk profile highlights the need for flexibility in hardware configuration to provide extensible mitigation. Specifically, a dynamic risk profile highlights the need to reduce the susceptibility of hardware to obsolescence-related risks and weaknesses over its life cycle. Similarly, such an extensible mitigation provides the means to mitigate defects discovered post-fabrication. Just as with software patches and updates, new attack surfaces on hardware may become exposed through the mitigation being applied, but they will likely take a long time to discover. In the example above, the programmable logic is updated to provide a new configuration to protect the hardware. In this context, access to hardware reconfiguration must be limited to authorized parties to prevent an unauthorized update that introduces weaknesses on purpose. While programmable logic may have mitigated a specific attack surface or type of weakness, additional mitigations are needed to minimize risk more completely. This is mitigation-in-depth – multiple mitigations building upon one another. Throughout the entire supply chain, critical pieces of information can be inadvertently exposed. The exposure of such information directly enables the creation and exploitation of new attack surfaces. Therefore, the supply chain infrastructure must also be aware of weaknesses and work to protect the creation, use, and maintenance of hardware components the dynamic risk profile offers a framework to balance mitigations in the context of risk and cost throughout the complete hardware and system life cycles.
Current efforts seek to move from compliance-based systems to risk-based systems to support mitigation-in-depth in situations when compromises are needed to address the increasing complexity of hardware components, intellectual property of hardware interconnected with software and firmware, and approaches. Promising approaches include game theory analysis, use of confidence intervals for detecting counterfeit defects, and distributed ledger technology to hardware manufacturing data to create an immutable record for component provenance and traceability. Efforts are underway to articulate new standards for hardware assurance and methods to leverage quantifiable data to make inform critical system engineering trades.
– 250 words