Multi-Core MCUs: Are There More Attractive Alternatives?

30 July 2015

The evolution of new MCU features has, in part, come from borrowing innovations from their powerful CPU predecessors. This has allowed MCUs to leverage innovations such as bus matrix interconnects to improve on-chip bandwidth, cache memories to improve memory efficiency, multi-stage pipelines to improve operating frequency and a host of other capabilities. One of the most recent innovations to make its way into some MCU architectures is the use of multi-core processors to improve power and performance efficiency.

CPU architects found that continuously increasing performance came with an increasingly higher power and die size requirement. Sure, you could double the frequency of the processor and get twice the performance, but the power requirement increases too, and usually with more than double the power price tag. All the supporting hardware the processor needed—fast code access, fast RAM access, larger and multi-level caches, multiple and more complex high speed busses to ‘feed’ the faster processor, faster memory accesses off chip—the price in power dissipation and extra die size that these sub-systems required just became too much. It was easier to double the number of processors to double the performance. Currently, CPUs can have several processor cores—Intel’s i5 and AMD’s Phenom II X4 both have four cores. However, Intel and AMD offers four cores and even eight cores CPUs.

Several challenging application requirements seemed to be calling out, asking for MCU architects to use multi-core processors as a solution. MCUs, primarily in embedded applications, needed more processing power too, while keeping power dissipation low. The increasing use of high speed network interfaces such as Ethernet, advanced video requirements for Human Machine Interfaces (HMI) and new requirements for encryption to secure transmissions and data storage MCUs were finding it difficult to keep up.

Intel and AMD Multi-core CPUsIntel and AMD Multi-core CPUsThe use of multiple processor cores on MCUs seems like a possible solution. Often an MCU application has more than one key processing requirement. Managing the high speed bus interfaces, for example, could be done with a dedicated processor, leaving another processor to manage the user interface and algorithmic portions of the design. If interface traffic hit a lull, the traffic processor could even be put in a low power state, dramatically reducing power dissipation.

Algorithms with high processing requirements could utilize both processors to speed execution time. Algorithms with a parallel structure (where separate data streams could be processed simultaneously) are easily decomposed—just run the same code on both processors and double the processing throughput. Even algorithms without obvious parallelism can be separated so one processor works on the first few stages of an algorithms and a second processor works on the other stages. Digital signal processing algorithms with multiple filter stages is a good example of this type of staged algorithm.

Alternative to Multi-Core Implementations

MCU manufacturers have been innovating in other ways to improve power and performance efficiency. These innovations are not just ‘hand-me-downs’ from their CPU predecessors however, and may ultimately provide better efficiency that a multi-core approach. The traffic control application using a multi-core solution described previously, for example, can be approached by distributing intelligence to the peripherals, instead of centralizing control using a second CPU. Several MCU manufacturers have been gradually increasing the intelligence of their peripherals so they can operate, even in fairly complex modes, independent of the CPU.

Some common intelligent peripheral functions utilize interrupts and Direct Memory Address (DMA) capabilities to operate independently from the processor. Ethernet peripherals, for example, often have a dedicated DMA controller that can move packets from the Ethernet port to a SRAM buffer, without processor involvement. Once the packet is completely received, the processor can be alerted, using an interrupt, and then packet processing can begin. If two buffers are used, data reception and data processing can operate in parallel, similar to the parallel processing approach used by a dual core solution. Intelligent peripherals, however, use just a fraction of the power and die size a second processor would require.

Other approaches can be used to add ‘smarts’ to MCU peripherals. Some peripherals can be put into special modes where data is examined, and only if it is of interest to the application then notification is given to the processor. For example, some Analog to Digital Converters (ADCs) have a windowing function that examines converted data and compares it to an upper and lower threshold. If the data is within the threshold, all is good. If the data is outside the defined ‘window’, the processor can be notified and corrective action can be taken. Some serial communications peripherals have a similar function where part of the received data is examined and if it matches the predefined values, the processor is notified. This is particularly useful when some messages have high priority than others, such as low power or high temperature system warnings.

One feature used by multiple MCUs to further leverage intelligent peripherals is to use an event routing sub-system. Different manufacturers use Event Routing Example: Atmel SAM4L MCU Peripheral Event System. Image credit: AtmelEvent Routing Example: Atmel SAM4L MCU Peripheral Event System. Image credit: Atmeldifferent names for this feature, but the function is very similar. An event routing sub-system allows peripherals to communicate directly with each other, using dedicated hardware. This allows many complex peripherals tasks to take place autonomously from the central processor. As an example, an ADC can take readings from an external sensor, perhaps controlled by a timer unit that defines the rate at which conversions are captured. Conversions can be stored in SRAM and sent out over an SPI port for data logging in an SPI Flash memory by using a Direct Memory Access Controller (DMAC). The processor never needs to be involved in any of these steps. If needed, it can be notified when the data logging function is completed, or if any errors occur along the way.

Another key advantage that autonomous peripherals provide is the opportunity to put the central processor in a low power mode. If most peripheral functions can be handled without processor involvement, it need not be ‘idle’ while waiting for something to do. It can transition into a low power ‘sleep’ mode until there is something important to do. If there is something important for the processor to handle, it has plenty of bandwidth available, since all the peripheral ‘housekeeping’ is done independently. This results in a power efficient implementation.

Distributed intelligence in MCU peripherals and a method to interconnect peripherals are powerful approaches to creating more than autonomous peripherals—let’s call them autonomous functions. MCU architects are adding a new capability that expands this concept even further by borrowing an idea from programmable logic devices. When programmable logic capabilities are included as an element for implementing autonomous functions, then more of the processing associated with peripherals can be distributed. Simple logic functions can be defined by the designer and configured on the MCU. These logic functions can often operate on a combination of input pins, interrupt signals, timer outputs and other peripheral signals to create Boolean functions and simple state machines. For example, the pattern matching logic on the NXP LPC5410x MCU can be used to detect a variety of input combination and transitions. Eight different ‘slices’ of configurable logic can be combined to create a fairly complex function used to drive the interrupt system. MCUs that have both programmable logic and event routing can create higher level autonomous functions, encroaching further into the realm of multi-core implementations.

Example of on-chip programmable logic: NXPs Pattern Match Logic (one of eight slices) on the LPC5410x MCU. Image credit: NXPExample of on-chip programmable logic: NXPs Pattern Match Logic (one of eight slices) on the LPC5410x MCU. Image credit: NXPMulti-core MCUs may continue to find a home in the highest performance applications, but do not overlook some of the other approaches MCU architects are taking to improve performance without dramatically increasing die size and power dissipation. If your MCU has many peripherals that can be organized as several autonomous functions, you may find you only need a single processor—one that can ‘sleep in’ while intelligent peripherals stay awake, doing all the real work.

Related Links:


To contact the author of this article, email

Powered by CR4, the Engineering Community

Discussion – 0 comments

By posting a comment you confirm that you have read and accept our Posting Rules and Terms of Use.
Engineering Newsletter Signup
Get the Engineering360
Stay up to date on:
Features the top stories, latest news, charts, insights and more on the end-to-end electronics value chain.
Weekly Newsletter
Get news, research, and analysis
on the Electronics industry in your
inbox every week - for FREE
Sign up for our FREE eNewsletter
Find Free Electronics Datasheets