All Systems Go 2017: Updating Embedded Systems
At the All Systems Go conference in Berlin, Michael Obrich reported at 2017-10-22 about "Updating Embedded Systems - Putting it all together", based on the experiences of the Pengutronix integration team.
While some years ago embedded systems have almost never been updated, the recent IoT security incidents started to make the device manufacturers think about how to bring new software to their embedded devices. Many Linux systems implement an "A/B" scheme, consisting of two system software slots. One is currently running and an update is written to the currently-unused slot. This mechanism can be implemented in an "atomar" way, so even a failure in the update process cannot brick the system.
Besides writing the actual update (being more or less generic), there is another important task: the functionality of the updated system needs to be checked against possible errors. In case something went wrong, interactive devices have the possibility to show a related message on the screen, giving the user the option to switch back to the old software. The same concept works for non-interactive systems which are controlled by a rollout server. Completely unattended systems are a more difficult case: there is the possibility to use the watchdog to supervise the startup procedure; after the kernel has booted, systemd can be used as the watchdog handler. While in normal desktop usecases systemd falls back to a shell in case of an error, non-interactive embedded usecases need to take additional effort to supervise its applications correctly:
- The system can try to boot a 2nd time in case of an error, counting the boot attempts in the bootloader (i.e. barebox). When the reboot counter reaches a limit, the system can do an automatic fallback to the old version.
- In case of transient errors, this mechanism can run in a loop.
- For more safety critical systems, the device can be brought into a safe state and shutdown in case of an error.
Usually the exact setup is highly project specific.
The infrastructure for writing updates and taking care of the software slots is often implemented with RAUC in our projects; RAUC is a sophisticated tool that can check cryptographical signatures or update the boot slots while transferring minimal data by using the new casync mechanism.