How to Efficiently Achieve ASIL-D Compliance Using NoC Technology

The Role of ASIL-D Functional Safety

Increasing layers of electronics and software are being added to vehicles, from infotainment to engine, brakes, and various sensors for ADAS and autonomous driving.

Figure 1. As electronics become part of critical functions, it is necessary to ensure their functional operation is adequately monitored. Source: Arteris IP

To address the increasing use of electronics in automobiles, the ISO 26262 Functional Safety for Road Vehicles standard was developed. This standard is intended to ensure that the electronics are designed to specified levels in order to minimize the chance of failure; to ensure they fail safe when incidents do occur; and that errors can be traced through documentation. Defined under ISO 26262, ASIL-D (or Automotive Safety Integrity Level D) is a risk classification established by performing a risk analysis of a potential hazard. The risk analysis reviews the severity, exposure and controllability of the vehicle operation scenario in the presence of the potential hazard. The safety goal for that hazard drives the ASIL requirements.

There are four ASILs identified under the standard, A through D, with ASIL-D representing the highest level of integrity (Table 1). Under ASIL-D, the chance of a single point of failure must be 1% or less.

Table 1. The ASIL levels and metrics.

While many semiconductor suppliers have already met ASIL-B and ASIL-C levels, meeting ASIL-D is proving to be more difficult and costly. Designers need an inexpensive and efficient means to ensure that the logic and wires in the chip are protected from failures. This means that failures must be detected and corrected in real time, such that the system can continue to execute in as many scenarios as possible. The SoC’s network-on-chip (NoC) interconnect is a critical part of the chip, which poses unique challenges for meeting ASIL-D requirements.

name="OLE_LINK2"> name="OLE_LINK1">The Three Paths to ASIL-D Compliance for NoCs

In aerospace and aeronautics, the default path to functional safety for systems is a complete and total replication of critical-path elements, with up to three copies of the same SoC. This means the NoC and everything else inside the SoC is replicated three times. If one SoC goes bad, it sends out an alert to blink an LED, but otherwise everything can continue to work, and the faulty IC can be replaced. This approach, while straightforward and complete, adds too much cost, weight and power consumption to be feasible for high-volume automotive applications.

The second option for ensuring ASIL-D compliance in on-chip networks is “path diversity” with dynamic routing. With path diversity there is more than one way to go from point A to point B (Figure 2).

Figure 2: Path diversity provides redundancy in a SoC NoC by doubling many paths, making it an expensive option as all possible paths in the NoC must have a backup. (Source: Arteris IP)

However, the downside to path diversity is that the designer must ensure that all possible paths in the NoC have a backup. This backup overhead could be almost as expensive as replicating the entire network. In addition, reprogramming the routing tables upon detection of a path failure will result in disruption of service, and the availability of backup connections must be verified with simulation and fault injection, which can be very time consuming and expensive.

Figure 3: Performance degrades when re-routing traffic in a NoC with path diversity. (Source: Arteris IP)

Complicating the issue is that the performance degradation that results from traffic re-routing around a failing node is difficult to model. Architects need to simulate all possible failures and re-routing around the failure to understand how performance will change (Figure 3).

Another factor designers must consider is that re-programming the NoC routing requires software intervention in the field. This adds further complexity and risks of failure to the process due to potential bugs in the software and the customer update flow. It also adds a large verification effort to ensure software is able to re-program the NoC correctly for any component that might fail. As this becomes a customer visible feature, full support must be given in terms of documentation and traceability which meets the ISO 26262 standard’s requirements, which the customer must then propagate along the customer’s supply chain to the automotive company.

The third approach to functional safety is to use ECC error detection and correction logic on the interconnect wires, while replicating only those logic functions that absolutely require it (Figure 4). ECC is a standardized and well-understood mechanism, commonly used in the industry.

Figure 4: A more cost-effective and elegant approach to functional safety employs ECC, while replicating only the logic functions that absolutely require it. (Source: Arteris IP)

This strategy avoids path diversity and dynamic routing altogether, while reducing replication to only a few key elements of the NoC.

ECC is used to detect single-bit errors and correct them in real time without any system interruption, while detecting and flagging double-bit errors for the system to respond to as required, per ASIL-D (Figure 5). While the control blocks are replicated, these are a relatively small percentage of the total interconnect logic area, as the data paths tend to dominate.

Figure 5: ECC avoids any performance degradation associated with one-bit errors by fixing them in real time. (Source: Arteris IP)

The actual implementation requires putting wrappers around the blocks that control the data flow through the NoC, then putting down two or three copies of them and bringing all their outputs together. Each copy of the control blocks gets the same inputs, and the outputs of replicated blocks are compared on every cycle; if they diverge a problem is quickly identified. This closely resembles what critical systems for aerospace have always done in terms of replicating systems.

The combination of ECC and control-block replication has been shown to be more efficient with respect to duplication of the network to support path diversity: for instance, adding 8 bits of ECC to a 64 bits link increases the size of the network by 12.5%, compared to a duplication that can double the network size. This strategy also avoids the need for software intervention, as ECC kicks in automatically to correct errors. There is no degradation in performance, though the safety level may be affected. If a link does go down completely, which is very unlikely, other logic will also likely be affected, which would make a “path diversity” approach useless. When it comes to verification, this third approach to functional safety is also simpler, and it requires existing tools and techniques. Fault injection can be used to measure fault coverage with third-party tools, such as Z01X from Synopsys and Austemper solutions.

Conclusion

Compared with using path diversity and dynamic routing to handle failures in the NoC, Arteris’ functional safety approach to protecting the network with error correction and minimal replication has a number of clear advantages. It is more efficient to implement, takes fewer gates and wires, uses simpler switches that are faster (with no programmable routing tables), does not require software intervention, and is simpler to verify.

All of these benefits are available from Arteris; visit their website today.

How to Efficiently Achieve ASIL-D Compliance Using NoC Technology

The Role of ASIL-D Functional Safety

name="OLE_LINK2"> name="OLE_LINK1">The Three Paths to ASIL-D Compliance for NoCs

Conclusion

Discussion – 0 comments

ELECTRONICS AND SEMICONDUCTORS

ELECTRONICS AND SEMICONDUCTORS

RELATED ARTICLES

RELATED ARTICLES