Picture this: The networking stack on the main pump controller has crashed, and you need to reboot it -- but it's 20 meters underground, on another continent -- and there's no-one on site to hit 'reset'.
Or you're bowling along the highway and one of the processor cores in your self-driving car gets zapped by a cosmic ray (yes, this could actually happen). The software can't tell whether the resulting error is a transient glitch or a hardware fault, so limits you to 50 kilometers per hour for safety: No fun with a monster truck hurtling up behind you.
Chip designers such as ARM and Imagination Technologies are applying industrial safety design techniques to their processor cores so that they can get themselves out of situations like this. You could soon feel the benefit even if you don't run a subterranean pumping station in Azerbaijan, nor yet have a self-driving car in your garage.
We place a lot of faith in the processors in our internet of things, counting on the software they run to perform to spec in all circumstances. But there are problems that no amount of bounds checking, input sanitization or exception handling can fix.
That's why manufacturers in many industries seek to make their products functionally safe -- that is, ensuring that they remain in a safe state and respond as expected, regardless of environment, inputs or hardware failures. There are even standards for how to do so: The generic IEC 61508 has variants for specific industries, including ISO 26262 for automotive manufacturing.
It's one thing building such a system when you're designing or specifying every detail of every component yourself. Relying on an external suppliers for complex subsystems -- or your suppliers' supplier in the case of microprocessors designed by one company and built by another -- is something else entirely.
The standards describe how to incorporate such components -- known as safety elements out of context (SEooCs) -- into functionally safe systems, and companies such as ARM and Imagination Technologies are applying them to their processor cores designs.
ARM has offered functionally safe variants of some of its Cortex-R series processor designs for a couple of years now. These are processor cores designed for hard real-time applications, where a response must come within a fixed window of time. The blazing fast cores that you will find in computer vision applications or the latest flagship smartphones, though, are more likely to belong to the Cortex-A line, none of which are available in functionally safe variants.
Imagination has a competing line of low-power core designs based on the MIPS architecture, which have also found their place in computer vision applications -- although sadly for Imagination, not in smartphones.
Mobileye uses MIPS cores in the EyeQ 4 system-on-chip (SoC) devices it develops for auto manufacturers to provide automated driver assistance systems such as lane keeping or adaptive cruise control.
Last year, it said it would use Imagination's MIPS I6500 core in its EyeQ 5 SoCs intended to support autonomous driving. The I6500 is a 64-bit multicore design in which the cores can run at different speeds ("heterogeneous inside," as Imagination puts it) and which can easily connect to GPUs and other application-specific accelerators ("heterogeneous outside").
There was just one hitch: There was no guarantee that it would be functionally safe, a must in the safety-conscious automobile industry.
Now, though, Imagination has overhauled the design. A new version, the I6500-F, contains additional transistors to flag errors in data transmission and storage, and to perform regular self-test operations on processor cores in a way that doesn't affect operation.
Chip designers that are able to support functionally safe design methodologies can help device makers get their products on the market quicker.
"There are some changes to the 6500 to make it the F, but these changes don't change the functionality of the design," said Tim Mace, Business Development Manager for MIPS and Imagination. "It runs the same software and integrates with accelerators in the same way."
The main benefit of the new version, he said, is the level of compliance with ISO 26262, but there's also some additional logic, largely transparent to the user, that is able to report errors when it finds them.
"Even if your silicon does exactly what you intended it to do, it can still change over time. For example, it can just wear out. Or you can get some random errors, for example from solar radiation or background radiation. A bit could change. You need some resilience to spot these errors when they occur and some mechanism to make sure it fails safe or recovers gracefully."
That resilience comes from the incorporation of parity checks and logic for built-in self-testing. In a multi-core processor design like the I6500, that could involve repeatedly moving processes to a known-good core, and self-testing the now-vacant core to be sure it too is safe.
"In automotive there is a requirement that if an error occurs, you need to detect it within a certain period of time. We need to repeat these checks in the background to continuously check if an error has crept in," said Mace.
The extra logic increases the die size of the chip by less than 10 percent, and will entail a slight increase in cost, but that could be a small price to pay for a "thing" that stays on the internet -- or the road -- even when the unexpected happens.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.