How to Efficiently Achieve ASIL-D Compliance Using NoC Technology
Sponsored content
As more automotive safety-critical functions come under electronic control, system-on-chip (SoC) designers are increasingly being tasked with ensuring that their on-chip networks are also ISO 26262 ASIL-D Functional Safety compliant, down to the level of logic gates and interconnect wires. With the emergence of advanced driver assistance systems (ADAS) for higher levels of vehicle autonomy, the complexity of SoCs being designed for automobiles continues to rise, and the interconnect between blocks of these SoCs is becoming an increasing share of the total design. This added complexity is making it difficult to cost-effectively meet the ASIL-D Functional Safety requirements for a new automotive SoC, while also meeting the application’s time-to-market, power and performance demands.
The time-to-market demands themselves are also tightening, even as the global automotive electronic control unit (ECU) market is expected to grow at a CAGR of six percent from 2018 to 2023, to an estimated $58.4 billion. The three main drivers of this growth are more stringent government regulations for passenger safety, increasing vehicle production and, most importantly, dramatically growing electronic content per vehicle. At the same time that demand for automotive functional safety compliant SoCs is increasing, the expertise available to ensure that chips meet ASIL-D requirements isn’t sufficient to meet the rapidly growing demand, especially with new automotive SoC developers entering the market. These new contenders are looking for the most easy-to-implement, market-ready solutions possible.
A closer look at ASIL-D compliance options, including redundancy, path diversity and error correction codes (ECC) is required to fully optimize an on-chip network to meet rapidly evolving technology and customer demands.
The Role of ASIL-D Functional Safety
Increasing layers of electronics and software are being added to vehicles, from infotainment to engine, brakes, and various sensors for ADAS and autonomous driving.
To address the increasing use of electronics in automobiles, the ISO 26262 Functional Safety for Road Vehicles standard was developed. This standard is intended to ensure that the electronics are designed to specified levels in order to minimize the chance of failure; to ensure they fail safe when incidents do occur; and that errors can be traced through documentation. Defined under ISO 26262, ASIL-D (or Automotive Safety Integrity Level D) is a risk classification established by performing a risk analysis of a potential hazard. The risk analysis reviews the severity, exposure and controllability of the vehicle operation scenario in the presence of the potential hazard. The safety goal for that hazard drives the ASIL requirements.
There are four ASILs identified under the standard, A through D, with ASIL-D representing the highest level of integrity (Table 1). Under ASIL-D, the chance of a single point of failure must be 1% or less.
While many semiconductor suppliers have already met ASIL-B and ASIL-C levels, meeting ASIL-D is proving to be more difficult and costly. Designers need an inexpensive and efficient means to ensure that the logic and wires in the chip are protected from failures. This means that failures must be detected and corrected in real time, such that the system can continue to execute in as many scenarios as possible. The SoC’s network-on-chip (NoC) interconnect is a critical part of the chip, which poses unique challenges for meeting ASIL-D requirements.
name="OLE_LINK2"> name="OLE_LINK1">The Three Paths to ASIL-D Compliance for NoCs
In aerospace and aeronautics, the default path to functional safety for systems is a complete and total replication of critical-path elements, with up to three copies of the same SoC. This means the NoC and everything else inside the SoC is replicated three times. If one SoC goes bad, it sends out an alert to blink an LED, but otherwise everything can continue to work, and the faulty IC can be replaced. This approach, while straightforward and complete, adds too much cost, weight and power consumption to be feasible for high-volume automotive applications.
The second option for ensuring ASIL-D compliance in on-chip networks is “path diversity” with dynamic routing. With path diversity there is more than one way to go from point A to point B (Figure 2).
However, the downside to path diversity is that the designer must ensure that all possible paths in the NoC have a backup. This backup overhead could be almost as expensive as replicating the entire network. In addition, reprogramming the routing tables upon detection of a path failure will result in disruption of service, and the availability of backup connections must be verified with simulation and fault injection, which can be very time consuming and expensive.
Complicating the issue is that the performance degradation that results from traffic re-routing around a failing node is difficult to model. Architects need to simulate all possible failures and re-routing around the failure to understand how performance will change (Figure 3).
Another factor designers must consider is that re-programming the NoC routing requires software intervention in the field. This adds further complexity and risks of failure to the process due to potential bugs in the software and the customer update flow. It also adds a large verification effort to ensure software is able to re-program the NoC correctly for any component that might fail. As this becomes a customer visible feature, full support must be given in terms of documentation and traceability which meets the ISO 26262 standard’s requirements, which the customer must then propagate along the customer’s supply chain to the automotive company.
The third approach to functional safety is to use ECC error detection and correction logic on the interconnect wires, while replicating only those logic functions that absolutely require it (Figure 4). ECC is a standardized and well-understood mechanism, commonly used in the industry.
This strategy avoids path diversity and dynamic routing altogether, while reducing replication to only a few key elements of the NoC.
ECC is used to detect single-bit errors and correct them in real time without any system interruption, while detecting and flagging double-bit errors for the system to respond to as required, per ASIL-D (Figure 5). While the control blocks are replicated, these are a relatively small percentage of the total interconnect logic area, as the data paths tend to dominate.
The actual implementation requires putting wrappers around the blocks that control the data flow through the NoC, then putting down two or three copies of them and bringing all their outputs together. Each copy of the control blocks gets the same inputs, and the outputs of replicated blocks are compared on every cycle; if they diverge a problem is quickly identified. This closely resembles what critical systems for aerospace have always done in terms of replicating systems.
The combination of ECC and control-block replication has been shown to be more efficient with respect to duplication of the network to support path diversity: for instance, adding 8 bits of ECC to a 64 bits link increases the size of the network by 12.5%, compared to a duplication that can double the network size. This strategy also avoids the need for software intervention, as ECC kicks in automatically to correct errors. There is no degradation in performance, though the safety level may be affected. If a link does go down completely, which is very unlikely, other logic will also likely be affected, which would make a “path diversity” approach useless. When it comes to verification, this third approach to functional safety is also simpler, and it requires existing tools and techniques. Fault injection can be used to measure fault coverage with third-party tools, such as Z01X from Synopsys and Austemper solutions.
Conclusion
Compared with using path diversity and dynamic routing to handle failures in the NoC, Arteris’ functional safety approach to protecting the network with error correction and minimal replication has a number of clear advantages. It is more efficient to implement, takes fewer gates and wires, uses simpler switches that are faster (with no programmable routing tables), does not require software intervention, and is simpler to verify.
All of these benefits are available from Arteris; visit their website today.