ARM’s big.LITTLE is a processor technology that you are going to hear a lot about as Samsung, Huawei and others promote their 8 to 10 (or 12!) cores chips used in smartphones, tablets and possibly Chromebooks.
There is no doubt that marketing teams will crank up the rhetoric because you know, 8 is better than 4, which is better than 2… But these chips not always X-core in the way that most people think about X-cores, which means X cores that work together on a computing task.
Instead, big.LITTLE X-core chips often have (but not always) two sets of four cores, some big, some small (hence the big.LITTLE name), that take turns to execute a task with the most efficient power envelope. Let’s take a closer look:
big.LITTLE overview
big.LITTLE comes from a simple observation: in order to make each processor core faster, they need to get bigger to host more execution units, instruction decoder, cache memory etc… But as chips get bigger, they invariably have more transistors and require more power to maintain or switch state.
ARM’s newer Cortex A15 gets the ARM architecture one step closer to laptop performance, but during peak utilization, it draws more power than its predecessors. Unfortunately, it also draws more power at the lowest utilization level, and that is the real issue when it comes to battery life. Most of the time, your smart device is doing little to nothing in your pocket or purse, and that’s when you want it to maintain its charge. This is similar to human metabolism where a tall and athletic person will burn more calories than a smaller average person, even during their sleep.
The idea of big.LITTLE is to pair a tiny and ultra-low power ARM Cortex A7 core to a fast and modern Cortex A15 core so that when a menial task is required, the A7 can take care of it without requiring the muscles (and energy) of the A15 core. Both cores are architecturally compatible and can run the same software. Tasks can be migrated on the fly from one to the other, which is the beauty of the system. The Cortex A7 core at peak utilization still draws less power than the A15 core at its lowest operating point but, it can still get a number of small tasks done.
That is the end-game of big.LITTLE.
As you can see above, both A15 and A7 cores do access the same memory sub-system but they each retain their internal cache memory. This allows the cores to be “hot-swappable” and ARM says that it takes about 20,000 cycles to switch from one to the other. In computing terms, it is not negligible, but given that 1GHz represents one Billion cycles per second, this is really fast in human terms.
From the Applications’ perspective, an 8-core big.LITTLE chip shows up as a 4-core. Tasks are sent to the A7 or the A15 cores depending on their intensity (often measured by looking at the current core power consumption). The A7/A15 duo usually never work at the same time (ARM left the door open for that if the manufacturer wants to…). Only one or the other can take on the task at hand. Note that having an asymmetric number of cores, and pairing four A7 cores with two A15 ones is also possible in theory, but not as convenient as having an equal number.
Other chip architectures such as the NVIDIA Tegra 3 and Tegra 4 have commercialized variants of the same idea. Both Tegra chips contain a low-power “companion” core that handles low-intensity workload. This is a very different implementation, but the basic principle is the same: send low-intensity tasks to the low-power core, and shut down the high-performance ones.
big.LITTLE power and performance benefits
After analyzing popular apps, it has become clear to ARM that many tasks could easily be handled by a tiny low power core, including things once deemed as “intensive” like 1080p video decode, which is now mainly performed by dedicated video hardware.
In ARM’s own tests, the company has measured over 50% in energy savings for popular activities such as web browsing and music playback. For web browsing, the duo A7/A15 achieves the same level of performance than the Cortex A15 alone, but hits that 50% energy savings target that ARM often refers to. Note that this is measured without using a graphics processor (GPU), which is another huge block of silicon in the chip. I mention this because desktop web browsers now use GPUs to accelerate some graphic workload.
On the other end of the spectrum, if you want to execute something very intensive like a photo/video edit, the A15 core can get the task done quicker and In the real-world, actual results will vary, but on the principle, this is arguably an effective method to reduce power and expand performance at the same time. It is the “wake up, get the job done and go back to sleep” method that has worked very well for Intel in the desktop world.
In conclusion to the power efficiency segment, keep in mind that ARM’s design should provide much higher peak performance and much lower idle state power consumption than before. However, sustained peak performance (gaming?) will deplete the battery much faster than before because the A15 core draws much more power. What’s important is that with the current usage pattern where your phone sits around doing nothing most of the day, big.LITTLE should help the average battery life – or at least not make it worse.
big.LITTLE commercial implementations
Samsung was the first company to show a working silicon using big.LITTLE at CES 2013. Their implementation pairs four Cortex A15 cores to four Cortex A7 core and seems to use the classic Dynamic Voltage and Frequency Scaling (DVFS) to choose when to migrate tasks.
We suspect that Huawei’s octo-core and other “8-core” chips to be announced in 2013 will follow the same model, which is the one that requires the least amount of software changes at the app level.
Chip makers can choose to enable all the cores (A7+A15) at once, but this is certainly a radical change from the 2013 software stack, but it would allow for maximum throughput in certain cases (= benchmarking). Most likely, this model will be avoided in the short term for consumer electronics, but may be used for Enterprise applications.
There are effectively three ways of running heterogeneous clusters of big.LITTLE CPUs:
- Clustered Switching: switching from the “big” or LITTLE clusters. Only one cluster can be active at once.
- In-Kernal switcher: pairing the big and LITTLE core as one virtual core seen by the app. The OS chooses which physical core is active and most efficient, without the app knowing. Again, only half of the cores are active at once, but big and LITTLE can be active at the same time.
- Heterogeneous multi-processing (global task scheduling): ALL physical cores are active and contributing. This could be very useful for large computing tasks, but it’s extremely unlikely that a commercial app can make use of this mode in a power-efficient way.
You can get more details about these 3 modes on the big.LITTLE Wiipedia page. The official ARM big.LITTLE page also offers interesting answers about which cores get activated and why. Am ARM short overview (PDF) also shows how different modes behave during a task migration.
big.LITTLE unavoidable marketing slippery slope
We’re already seeing it: big.LITTLE will be used and abused by marketing departments worldwide with the term “8-core”. While factually correct, some four Cortex A7/A15 duos should not be called “8-core” when only four of them can be activated at once.
Using 8-core when only four are active at a time is like calling soccer was called a 44-players sport because there are 22 on the field and 22 on the bench. In effective terms, those “8-core” processors are really “quad-core” chips when it comes down to real workload.
Newer implementations do allow all cores to be fired up at once (global task scheduling), and the usefulness depends on the task at hand.
“Megapixel” still resonates with “better image quality” for all the wrong reasons, so you can expect that “8-core” (or even 10 cores now) will be one of the central marketing message going forward. Now, you know.
Filed in ARM, Chips, Semiconductors and SoC.
. Read more about