Showcase: Fail-Safe (OTA) Field Updating
Being able to robustly and securely update embedded systems and IoT devices in the field is a key requirement of every product today. The update framework RAUC is the basis for a modern and future-proof solution. In this showcase we present the basic principles of a fail-safe update system and how Pengutronix can support you with implement this for your platform.
The basic requirements for an update in the field are clearly defined: The update should be installed in a fail-safe manner (reliability), but the device must remain protected from unauthorized access (security). Depending on the field of application, the update comes from a local medium (e.g. a USB stick) or should be rolled out fully automatically via the network (deployment).
In addition to these obvious requirements, there is a whole range of other considerations and decisions that should be made in the project phase as early as possible and that have a major impact on the success or failure of the project. More on that later…
Atomic Updates (and Fallback)
The magic word for fail-safe updating of systems is atomicity. This means: It must be ensured that an update has been successfully installed in full before it is enabled for use.
To ensure this, redundant boot partitions are used (also called dual copy or A + B approach). Two completely equivalent system partitions are used to perform updates on each other. The active partition gets written from to the inactive one, which is only switched to be the new active partition after the write process has been successfully completed. Switching between the partitions takes place in the boot loader. This also offers an advantage in terms of availability: With this mechanism, the update can easily be carried out in the background of the running application without interrupting it.
The use of redundant partitions then makes it possible optionally to fall back on the inactive system even in the event of a fault in the active system, thus significantly improving the availability of the platform.
Cryptographically Secured Updates
Since the update system enables the system software to be exchanged, this is a particularly critical area. In order to deny unauthorized persons access to the system, the update must be cryptographically signed when it is created and verified by the target system before installation. Asymmetric methods are ideal for this, as only the key for signing has to be kept secret, while the key for verification can be stored on the device without any further effort.
The use of common standards such as X.509 also allows more complex hierarchies with development keys, per device keys, multiple signatures, etc.
The devil is often in the details, and so is the development of an update strategy. The technical specifications such as processor type and memory technology must be taken into account as well as the requirements of the application and the ecosystem in which the device is integrated.
Typical further questions that then arise
- How do you separate data and logs from the operating system?
- How do you migrate application data after an update?
- What does a sensible partitioning of the memory look like?
- Is it possible to safely update the bootloader?
- How should the device behave in the event of an error?
- How does the device reliably recognize that there is an error?
How Can we Support You?
Together with you, our integration team sheds light on the obvious and less obvious questions and requirements for the update system and first develops a basic update concept that can be refined in the further course of the project.
With RAUC there is already an update framework maintained by Pengutronix and fully licensed as open source as a starting point. This allows you to focus on the essentials without having to reinvent the wheel. Nevertheless, the coordination of the overall system to the specific requirements of the customer and the operational environment is anything but trivial and includes the configuration of various components that have to be closely interlinked.
On the basis of your Board Support Package (BSP), our integration team implements a redundant boot setup, configures all necessary system components and prepares everything so that update artifacts can be generated and installed.
If you need additional functionality that is not or not fully covered by the existing components, we will be happy to expand this for you. If possible, so that new features can flow back directly into the main development branch of the projects and do not accumulate as a technical debt in the project.
As part of the clarification of the requirement and the specific individual setup of the basic update and boot loader configuration, Pengutronix supports you with, among other things:
- Adaptation of the boot loader for redundant boot partitions (barebox, U-Boot, Grub, UEFI)
- Initial configuration of RAUC in the BSP (Yocto, PTXdist, Buildroot)
- Coordination of the watchdog behavior
- Clarification of security / verification behavior
- Clarification of related issues such as configuration management, data migration, etc.
- Integration in customer application / connection to deployment infrastructure
- Support in the further course
The biggest mistakes in the design of a redundant booting system are often made in the early phase of the project. Pengutronix will be happy to evaluate for you on the basis of many years of experience 'right from the start' whether the selected hardware and in particular the storage technology used, the power management or the processor pose risks.
And why RAUC?
RAUC is a modern open source update framework that offers a lean and easy-to-maintain code base through efficient use of established libraries such as glib, OpenSSL and curl. RAUC is divided into a service that runs on the target platform and verifies the update artifacts there and installs them atomically, and a host tool with which the update artifacts (bundles) are generated.
With the reduction to the essentials and the provision of both a command line tool and a D-Bus API, RAUC can be easily integrated into existing customer applications.
One of the essential philosophies of RAUC is the description of the redundant boot behavior via configuration file directly on the device. This not only allows the system to be introspected, but also allows generic update artifacts to be created.
Additional features such as PKCS#11 support for signing bundles, options for re-signing bundles or accommodating intermediate certificates in bundles make it possible to meet the security requirements in a professional environment.
For the boot loader updates, which are often underestimated in the field, but often urgently required later, RAUC offers the option of performing them completely atomically for many applications.
How Can OTA Updates be Implemented?
If the company already has a deployment infrastructure or a hosted solution is to be used, RAUC can be triggered with the help of a simple service that accepts requests from the infrastructure and calls RAUC via D-Bus or the command line.
The open source project hawkBit is ideal for OTA updates in your own (self-hosted) infrastructure. This offers a highly configurable solution for device management, deployment scheduling and feedback.
The connection to RAUC is quite easy to implement thanks to the existing clients in the RAUC project. With the rauc-hawkbit-updater, a solid client component written in C is available that interacts with the Device (DDI) API from hawkBit via REST and with RAUC via D-Bus.
About 70,000 patches go into the Linux kernel every year, and many of them are bug fixes. The same applies to most other open source projects that are part of a modern Linux system. In order to benefit from the work in the community, the sensible strategy is to constantly update to the latest software version and keep the system up to date. Of course, with this amount of changes, new bugs can be added or incompatibilities can arise.
A firmware upgrade is due. A newly implemented feature needs to be rolled out, a security issue patched or new hardware support added. The software, while capable, is complex. Pengutronix' strategy to handle this complexity is working on a version- controlled Board Support Package (BSP) with continuous updates and tests on the latest mainline Linux kernel.
Project work with our customers includes the handling of hardware prototypes. Since work is generally done in parallel, on many project for many customers, there is a constant flood of hardware prototypes accumulating on the desks of our developers. These accumulations of loose boards can become a problem. This is especially the case when a number of people work on a prototype. Another common annoyance occurs when a project has not been worked on for a period of time, as this might involve moving the hardware from one desk (or storage location) to another and setting it up again. Right now, in a situation where working from home is more common and relevant than ever, this has become even more of an issue. The distances between desks and storage locations of our developers are now measured in kilometers, rather than meters.
Nowadays, even small and cheap microcontrollers offer enough calculation power to perform time critical tasks within an industrial environment. However, as soon as actors and sensors are spread over an entire facility and are to be connected over Ethernet, the actual moment when a data packet will get processed becomes very hard to predict. At this point, Linux running a Preempt RT Kernel altogether with a network featuring Time Sensitive Networking (TSN) capabilities can help.
This release fixes a vulnerability in RAUC that can be exploited under certain circumstances to achieve a local privilege escalation. It provides both a mitigation for the vulnerability when using the existing bundle format as well as a new bundle format that uses dm-verity to continuously authenticate the update data while it is installed.
It's been 3 weeks ago now since the tag for RAUC 1.4 was created. But it is vacation time and so we have a good excuse for communicating things with some delay. Fortunately, the media team is back now and so also those of you who haven't noticed the new release yet will be informed about notable changes.
The Corona crisis is a challenge that has hit many people as well as most companies quite unexpectedly. The entire team of Pengutronix wants to thank those that currently ensure our essential supplies, health system and civil infrastructure!
While the development on an embedded system I need to reboot it quite often. Doing so I appreciate to keep the required steps as less as possible and be sure the embedded system uses the recently changed data in a consistent manner.
Simplify and beautify your developer's life. An example.