The center will pursue a research agenda in four core research areas:
- Secure CPS
- Safe and secure AI integration into CPS
- Safety and security of decentralized CPS
- Interpretability of complex AI-driven CPS
Cyber-physical systems play a critical role in day-to-day lives, with applications including critical infrastructure, such as the electric power grid, medical devices, and autonomous driving. Advances in AI technology, in turn, have opened up new opportunities for further application, scale, and greater autonomy in CPS, particularly with AI used in service of perception and/or planning.
The nature of many applications of CPS necessitates a high degree of robustness and resilience. For example, failures in the electric power grid can lead to large-scale blackouts, while failures in autonomous driving systems can lead to congestion or crashes. While there is a great deal understood about safety in core constituent parts of CPS, such as control, increasing integration of AI into CPS poses numerous new scientific challenges that are not well understood. Indeed, even as research into AI vulnerabilities has gathered steam, it treats AI as a disembodied entity, focusing, for example, on vulnerabilities of deep neural networks in the context of image classification. However, little is understood about vulnerabilities of end-to-end systems that leverage AI techniques in the loop. The goal of the proposed TRAC center is to create intellectual and scientific foundations for developing trustworthy CPS in which AI plays a core role. As a means to this end, the TRAC center is a collaborative effort between experts in cyber-physical systems and control, AI, and information theory, across ESE and CSE. The vision of the center is to holistically study the safety and security of the full lifecycle of AI-driven cyber-physical systems, including their core building blocks, the AI algorithms that operate these, integrated collections of AI-driven CPS, and usability and explainability issues.
Research Area I: Secure CPS
Modern CPS, such as autonomous vehicles, rely on a complex network of digital devices for operation, including computing nodes (CPUs, GPUs), a CAN bus, wireless capabilities, and associated hardware. Reliable and secure operation of this hardware-software stack is critical to trustworthy autonomy, and is the central aim of this research area.
Cybersecurity Safety and security of CPS often relies on a collection of trusted compo- nents as well as their composition. Modern exploitation techniques for CPS often entail exploitation of multiple subsystems to form an attack chain which ultimately achieves the arbitrary manipulation of the platform. The aim of the TRAC cybersecurity research agenda is to break this attack chain by approaching the problem through building multiple layers of defenses at each layer of the attack chain, allowing the designer to employ defense-in-depth designs to secure the safety and security of the system. First, at the external facing subsys- tems where the attack surface is relatively large, we can employ strong hardware-enforced execution protection to partition and isolate critical components, minimizing the impact of potential compromises. Next, to battle various physical level signal manipulations that can potentially be catastrophic for a CPS, we can leverage the trusted sub-components to create a context invariance model that can aid the collection and cross-validation of various sensors inputs. Lastly, leveraging internal system software-based and hardware-based attestation, we can effectively measure the level of trust we have on the platform, and when the level of trust degrades to a certain level, a fail-safe system will initiate to allow the cyber-physical system to degrade gracefully limiting the impact on human safety.
Hardware Security CPS often employ AI/deep learning-based perception, planning, and control components that demand immense information processing capabilities and call for domain-specific designs of its computation engines and communication interfaces. Unlike the cloud computing paradigm that is powered by homogeneous general-purpose computing platforms, CPS will be characterized by heterogeneous specialized hardware such as GPUs, FPGAs, and hardware accelerators, presenting many open-ended challenges in hardware reli- ability and security. First, resources among these heterogeneous blocks have to be scheduled and managed judiciously to account for the varying workload characteristics and real-time performance requirements driven by the diverse mission and task scenarios facing the cyber- physical systems. Critical among these resources is the fixed amount of energy capacity, often provided by battery banks, that has to be shared by all the components in the system including sensing, computing, and actuation/locomotion. Adaptive and reliable power man- agement thus constitutes an important part to guarantee the safe and reliable operation of the system. Next, the multi-domain nature of CPS further complicates its supply chain and creates many opportunities for malicious actors to inject Trojans at both the software and hardware level. Yet the integrity of CPS is far more critical than typical computing systems, as even small compromise could cause serious physical damages and even loss of human lives. Therefore a proactive detection and monitoring scheme that can be embedded into the hardware platform to catch any anomalous behaviors and ensure the integrity of devices supplied by different vendors is of paramount importance in CPS. Finally, CPS platforms such as self-driving cars and unmanned aerial vehicles invariably have to interface with the physical world which is analog in nature, making analog and mixed-signal electronic com- ponents an indispensable part of the system. However, existing verification and certification methods are often based on a strictly digital formulation and incompatible with modeling of analog-domain operations. Therefore, there is great need for the development of formal approaches to model and verify analog behaviors arising from both the sensor modules and interfacing components in CPS, as well as the physical environment they interact with.
Research Area II: Safe and Secure AI Integration
The second core research area of the proposed center will be on safe and secure AI integration in CPS, where AI approaches are used for perception and planning. While AI is increasingly used in this integrated fashion, the security and safety consequences are still poorly understood, particularly given the fact that many AI methods, such as machine learning, typically provide relatively weak guarantees, in both functionality and computational performance, and have known security vulnerabilities. The vision of the safe and security AI integration research area will be to significantly advance the science of robust autonomous planning and control which relies on AI-based perception.
Safe and Secure Integration of AI-based Perception and Planning Some of the most important fundamental problems lie at the intersection of AI-based perception, and planning and control algorithms. In particular, planning approaches typically take as given
(a) location of the system (e.g., autonomous car), (b) location of static obstacles (e.g., buildings, road construction), and (c) location and anticipated motion of dynamic obstacles (e.g., other cars). In practice, all of these are inferred, imperfectly, using perceptual tools and sensing modalities that include GPS, IMU, lidar, radar, and camera, among others. On the other hand, the effectiveness of planning and control hinges fundamentally on the accuracy of location and dynamics estimation of self and other surrounding objects. While modern AI techniques, such as deep learning, play a critical role in these tasks, they also exhibit stunning vulnerabilities, with malicious parties able to readily hijack predictions by unsuspicious manipulations of the surroundings (e.g., stickers on stop signs). The center will study the implications of such vulnerabilities for CPS, for example, as AI is used in localization (to compensate for imperfect GPS), sensor fusion (e.g., lidar and camera inputs used jointly for situational awareness), and motion
System Design and Architecture Safe and effective integration of AI within CPS must address a fundamental new embodiment of AI algorithms within the cyber-physical semantics of those systems. This embodiment in turn introduces significant new tensions among:
- limits and scheduling of platform computational and physical resources, (2) concurrent demands for those resources from AI algorithms, controllers, on-board simulation, and other computational tasks; and (3) the timing and sequencing of sensing and actuation actions relative to when decisions are made by the AI algorithms. Specifically, safety properties (nothing bad happens) and resiliency properties (how large a disturbance a system can handle) depend strongly on complex and diverse interactions among the physical environment, the platform architecture, real-time resource scheduling, control, AI, and the applications running within CPS. New system architectures, frameworks, and configuration tools are needed that can (1) tractably and practically instantiate policies and mechanisms based on formal models spanning scheduling, control, the physical environment and AI algorithms,
- enforce rigorous and principled coordination dynamically and adaptively across decisions made in each of those domains to maintain safety and resiliency at run-time, and
- be configured and customized at fine granularity for a wide variety of combinations of applications, control and scheduling models, and AI techniques.
Secure integration of AI within CPS also must necessarily consider such embodiment, taking into account the ability of an adversary to profile vulnerabilities along multiple attack surfaces, spanning cyber and physical aspects of the system. For example, timing attacks based on perceptual inputs may be used to delay necessary actions (e.g., failing to steer into a turn resulting in departure of an autonomous vehicle from a roadway), cause decisions to be made and enacted early or unnecessarily (e.g., following a curve in the road before actually reaching it, or swerving off the road to avoid a fictitious obstacle). The cross-cutting nature of these attacks on cyber-physically embodied AI necessitate (1) new formal models of the timing, sequencing, and resource demands of AI algorithms and the consequences of their decisions, (2) characterization of possible attacks based on those semantics (potentially automatically off-line or even on-line using AI methods), and (3) development of new defense strategies, policies, and mechanisms to maintain safety and resiliency in the face of such attacks.
Research Area III: Safety and Security of Decentralized CPS
The emergence of autonomous vehicles will likely be followed by the emergence of autonomous vehicle fleets (in the way that Waymo already manages such a fleet). Analogously, as drone technology matures, applications, such as drone-based delivery services, will likely lead to large fleets of delivery drones. It is likely that the resulting ecosystems will be comprised of the following: multiple fleets of autonomous agents managed by competing fleet managers, and systems (autonomous or not) owned by individuals. Despite the clear indicators of the emergence of this ecosystem, virtually nothing is understood about its implications on safety and security. Research Area III will aim to develop foundations of safety and security in such decentralized CPS involving a collection of autonomous and/or non-autonomous agents broadly. Intended areas of application include smart cities, which combine smart infrastructure and smart (autonomous or semi-autonomous CPS devices) and IoT. The core challenges lie in the interplay between large-scale decentralized coordination of such devices in the complex system, as well as the strategic consideration among self-interested sub-systems, such as vehicle fleets or IoT providers.
Correlated Failures and System-Level Safety and Security It is likely that the perceptual architecture (comprised of the combination of sensors and AI backend which includes vision and sensor fusion elements, among others) will be similar across vehicles in each fleet, and perhaps even across fleets. This gives rise to highly correlated failure modalities. Given the emergence of adversarial example attacks on computer vision systems which involve un- suspicious modifications of the physical environment being perceived (say, a stop sign with stickers on it), what is the implication of such attacks in multiagent autonomous vehicle systems with correlated perceptual vulnerabilities? The answer is not a trivial inference from single-AI vulnerabilities, for two reasons: 1) typically, AI vulnerabilities have been studied in ways disembodied from the end-to-end planning architectures that use vision as only one of a number of perceptual inputs, and 2) multiple vehicles have multiple vantage points, and also respond to one another’s perceived movements (e.g., to avoid a crash). These issues motivate a series of fundamental research questions in the context of AI-based decentralized CPS about how to design such systems in a way that ensures system-level (and not merely individual) robustness against random and maliciously engineered failure modalities.
Coordination of Vehicle Fleets A core capability for decentralized CPS is to be able to communicate and interact with each other to perform various coordinated tasks. These tasks range from high-level tasks such as cooperatively planning their routes such that congestion can be minimized to low-level tasks such as cooperatively sharing sensor information and predictions, such as the predictions by autonomous vehicles about traffic signs and possible pedestrians on the road. However, as with any distributed systems, the interactions of the different autonomous agents can introduce failure modes that are unintentional (e.g., if one vehicle wrongly recognized a traffic sign and shares that information with other nearby vehicles, are all the other vehicles now more likely to also wrongly recognize the sign?) or even adversarial (e.g., can a vehicle that is compromised by an adversary able to corrupt the whole network of vehicles?).
A key part of the center’s research agenda is therefore to develop decentralized coordination algorithms that can perform the various coordination tasks relevant to CPS, while ensuring that they are sufficiently robust to the various failure modes. To successfully do so, each autonomous agent will need to reason about its environment, make plans to achieve its goals, and safely execute those plans; and doing all of this in collaboration with other agents by sharing information and inferring whether the information shared by other autonomous agent is trustworthy.
Strategic Interactions Among Fleets Insofar as fleet managers make self-interested choices, these have consequences for the overall state of the multi-agent system, including its safety properties as well as market composition (which in turn has consequences for system safety). For example, while much discussion has been about the eventual demise of non- autonomous vehicles, it is entirely possible that a market equilibrium always features non- autonomous cars, as well as individually-owned autonomous cars. Moreover, fleet managers compete in performance and safety criteria, and both trade these off in ways that maximize their respective profits. The center will therefore engage in a series of associated research questions, such as what consequences do strategic interactions among fleets of CPS have for overall system safety and security, and whether a safer environment warrants explicit regulation, and if so, what form should such regulations take.
Network Safety and Resiliency While CPS are by their very definition capable of making decisions without relying on external inputs, they commonly operate as part of an ecosystem that relies on extensive exchanges of information between its components. This is very much so in the vehicle fleets of the previous section, but is also present in ”smart” environments such as smart cities, smart manufacturing, smart grids, etc., where the distributed intelligence of individual components is both enabled and supplemented by the information they exchange. The timeliness or more generally temporal characteristics of such exchanges is often critical to the correct and efficient decisions of those systems. Hence, the ability to enforce timing guarantees in the communication infrastructure (network) that connects them, e.g., as embodied in specifications such as the IEEE Time Sensitive Networking (TSN) standard, is a vital aspect of their safety and security.
The need to secure the communication infrastructure of a distributed system is obviously not new. However, the time-sensitive nature of the information flow on which many cyber- physical systems rely creates a new family of attack surfaces predicated on perturbing the timing of information delivery rather than precluding or corrupting it, as has been the case with traditional attack vectors. For example delaying notification of congestion on selected street segments can easily result in distributed decisions among vehicles in a fleet that contribute to worsening the problem, possibly preventing the arrival of emergency vehicles where they are needed. Subtler and therefore harder to detect attacks are also feasible by exploiting specific timing dependencies present in individual systems. For example a smart grid that relies on predictive control models for power distribution decisions could be destabilized, leading to equipment damage or failure, by delaying the communication of planned sequences of future control actions between neighboring agents. The center will focus on assessing the sensitivity of the control loops present in CPS to timing attacks that can destabilize them, as well as developing application and network-level solutions towards hardening them against such attacks. The effort will include the development of certification procedures aimed at providing formal guarantees on the systems’ tolerance to delays in access to information from neighboring systems.
Research Area IV: Interpretability of AI-driven CPS
As we construct progressively sophisticated AI models for CPS, the dynamic interplay between human and machine grows ever more critical. Moreover, the requirements for these systems to be trustworthy necessitates techniques, such as collecting audit trails, for subsequent analysis of failures that facilitates improved functionality and safety. The challenge of building, training, validating, and maintaining such systems has the potential to become an expensive obstacle, and could introduce a number of human-centric problems. Comprehensibility is also an essential factor. Algorithms that efficiently make accurate predictions, classifications, or summaries become more trustworthy and valuable when the various stakeholders can understand the relevant aspects of their construction and use. It is, therefore, essential to consider human-in-the-loop techniques that will explain decisions made by the AI, both in the context of a single CPS and a decentralized collection of these. Research Area IV will establish foundations to address the compressibility and interpretability of AI techniques for autonomous agents.
The Visualization and Human-Computer Interaction fields provide a perfect foundation for fostering such human-machine interplay. Numerous examples from the existing literature leverage the explanatory and pervasive power of visualization tools to transform complex data into comprehensible formats. For single AI-driven CPS scenarios, we will investigate visualization techniques to communicate decision rationales to the user, as well as the best ways and means for collecting audit trails that best promote interpretability. The project will also explore methods for inferring or eliciting preference from natural behavioral tendencies. For decentralized AI-driven CPS, we will develop visualization tools for evaluating and maintaining the AI algorithms.