Should you use a coprocessor architecture in embedded systems?

Those who have lived and worked with technology since the days of large desktop computers have probably opened up a computer case and looked at all the chips all over the motherboard. Since the dawn of the computing age, PCs have implemented a coprocessor architecture, where some of the main processing tasks are offloaded onto a dedicated ASIC. Math coprocessors, FPUs and GPUs all come together to create a coprocessor architecture like you would find in a typical desktop, laptop, server and even supercomputers.

Embedded designers often need the additional processing power provided by a coprocessor architecture, but they might not have the luxury of board space, budget or power to implement such an architecture. Newer processors and chipset options are changing the landscape for embedded computing, both at the level of small devices and in larger systems. Newer components are also enabling more advanced applications like AI and edge computing, both as a coprocessor architecture or with a dedicated, highly integrated SoC.

Embedded computing and chipset architectures

Look at the landscape for embedded systems processors, and designers have plenty of options for use as host controllers:

MCU: These embedded workhorses are available from just about every semiconductor manufacturer, each with a huge range of features and interfaces. Typical interface options range up to PCIe Gen2, high-speed Ethernet, and USB 2.
FPGA: These are the most flexible and can be the most powerful option, yet they also have the highest barrier to entry in terms of programmability. You’re basically designing the chip itself alongside the physical layout on a PCB.
MPU: A microprocessor can implement an embedded OS like you might find in a computer-on-module or a single-board computer.
SoC: Systems-on-chip (SoCs) range in terms of specificity. For example, mobile SoCs are vendor or phone-specific, yet there are SoC products available that basically integrate some disparate components and modules into a single package.

Any of these could be used alongside another processor and operated as a pair of processors to implement an embedded application. As long as the interfaces match and are compatible with the level of compute needed in the external processor, you could comfortably implement this type of architecture in your embedded platform. An example is shown below.

This route is the typical path one would take with an MCU-based architecture that uses a set of ASIC peripherals; it simply scales it up to application areas demanding higher compute. Such an architecture is present in most embedded systems that do not require high compute or that only need external ASICs to provide specialized interfaces not present on the MCU die. It should be no surprise that AI compute accelerators like Coral have come onto the market. Accelerator ASICs fit nicely into the traditional MCU/MPU+ peripheral architecture and there is no sign this will change in the future.

Considerations in using a coprocessor architecture

In short, the value of a co-processor is to offload the primary processing unit so that tasks are executed upon hardware, in which accelerations and streamlining can be taken advantage of. The advantage of such a design choice is a net increase in computational speed and capabilities, and, as this article argues, a reduction in development cost and development time. Perhaps one of the most compelling realms for these benefits is in the area of space communications systems.

How much compute does the host controller need?

One of these chips needs to be responsible for orchestrating data movement throughout the system. There isn’t always a right answer to this question; it’s tempting to put the highest compute processor as the centerpiece, but the additional processor may need to provide specialized capabilities that require higher compute. It’s not uncommon to implement specialized processing in an FPGA and simply return results to a host controller MCU, which then implements the results in its main application.

Which processor acquires data and signals?

The coprocessor could provide the main interface where signals and data are collected from external systems, sensors or components. If immediate processing is needed with these data and signals, then it makes sense to implement this on the coprocessor rather than the host controller. A small FPGA might make sense as the coprocessor with an MCU as the host controller; the MCU just runs the application and orchestrates peripherals, while the FPGA provides targeted compute that is not available in most MCUs.

Is the coprocessor replacing one or more ASICs?

If the answer here is “yes,” then the coprocessor could be implemented in an FPGA to consolidate multiple devices into a single component. The reverse could be done as well; the host controller and peripherals could be implemented in an FPGA as the host controller, while an external MCU runs low-level logical tasks in parallel. This is less common as those low-level logic functions could still be implemented in the FPGA.

Coprocessors on a single die

Newer chip architectures for embedded systems are busy integrating everything onto a single die and package, including a coprocessor architecture involving multiple processors. This is not necessarily the same thing as a multi-core system, but rather integrating two formerly standalone processor blocks into a single package.

Take the recent release of the Zynq SoC from Xilinx as an example, which includes an ARM Cortex processor integrated with FPGA fabric on the same die. Two other examples are the SmartFusion and PolarFire platforms from Microsemi, both of which offer a mid-range option for embedded systems development. Other companies are following suit with their own FPGA and SoC product lines.

So which architecture will win the day? It seems like we’ve gone from disaggregated coprocessor architectures to SoCs over the recent past, and now we may return from whence we came. Perhaps the most sensible instance is when highly specific compute needs to be offloaded to an external processor and implemented on hardware. FPGAs are ideal for this application whenever an ASIC is unavailable or unable to provide the compute needed to execute the desired task.

To contact the author of this article, email GlobalSpecEditors@globalspec.com