umpf - Git on a New Level

Holger Assmann, Michael Olbrich | 2023-08-29 | kernel, linux, git, opensource, umpf

Modern software development is commonly accompanied by a version control system like Git in order to have changes to the source code being documented in a traceable manner. Furthermore, any version state should be easily reproducible at any time. However, for work on complex projects such as the BSP ("Board Support Package") of an embedded system with its multiple development aspects, the simple stacking of the individual changes on top of each other does not scale.

Oftentimes, different functionalities for a BSP are developed in parallel based on a common version state. Then there might be back ports to be integrated, and specific work intended for upstream has to be managed separately. This separation is commonly handled by introducing individual Git branches.

However, this model becomes complicated once the subsequent merging of more than two branches is due - e.g. in order to be able to test the entire project, or to publish a release. Manual merging of the individual branches is as time-consuming as it is error-prone and it collides with the short test cycles of iterative development models.

Merging Branches Reasonably

To simplify this last mentioned step and to be able to generate a reproducible patch stack from a selection of topic branches, we have developed the Universal Magic Patch Functionator (umpf) as an open source extension for the local Git installation.

With umpf, a linear patch series can be generated within the regular Git infrastructure and across multiple Git branches ("topic branches"), with their individual dependencies being preserved at the same time.

The Idea Behind umpf — umpf combines several parallel topic branches into a serialized patch stack. By referencing the respective HEADs, the underlying branches can always be restored to their state at the time of tagging.

This series of commits is described by a regular Git tag, which is supplemented by umpf with references to the contained topic branches. Such a utag thus combines several properties:

The marker includes a timestamp, which makes it uniquely referable.
A linear patch stack can be exported as a series.
The state of the underlying branches is preserved at the time of the umpf. Later changes of a branch (e.g. force pushes after bug fixes) have no influence on the resulting utag. Consequently, the actual development state at the time of tagging can be easily retraced and restored.
By archiving the contained branches, their dependencies among each other can be traced with normal Git tooling.
The project type is recognized heuristically and a suitable timestamp gets integrated - for example as EXTRAVERSION in a Makefile. This allows for a clear identification later, even of the already compiled and delivered software.

The generating of an utag and its linear patch series is relatively straight-forward and only requires an existing Git project. umpf itself is a bash script and uses the regular tools that Git provides. Therefore, tags and merges generated by umpf can later be processed by any normal Git installation. An utag is therefore also regularly added to the existing repository via git push and can be re-called again via git checkout.

In order to create an utag, umpf needs three parameters, which can either be provided by using a useries text file, or by calling umpf init:

base: The starting point on which the subsequent patch series is to be built. The umpf-base is usually an upstream commit, such as a release tag (e.g. for Linux: "v6.3").
name: The actual name of the umpf, which later also becomes part of the utag. It is recommended to derive it from the base, e.g. "6.3/release-name".
topic: The actual Git branches (topic branches) are each preceded by # umpf-topic:. They should themselves be built on the base commit to avoid complications. It is therefore recommended to re-base branches with a different starting point than the umpf-base first.

Backport Integration

Especially when working with upstream projects, it happens frequently that bug fixes or functional enhancements provided by third parties have to be integrated. The strength of umpf is that such changes can be maintained in a separate branch and thus do not interfere with your own work on the project.

However, this branch integration by umpf only works reliably when the underlying basis is comparable: Topic branches for v5.15 and v5.16, for example, can usually be processed together by umpf without any problems. On the other hand, a further branch based e.g. on v6.2, will lead to complications during umpfing due to the extensive differences in the overall code.

In order to integrate bugfixes and other backports, it is thus recommended to re-base those within their own topic branch on the umpf-base first.

Handling Merge Conflicts

umpf cannot resolve merge conflicts on its own. Manual conflict resolution during umpfing is possible and sometimes unavoidable, e.g. when working with device trees, but it should be kept to a minimum.

However, the solutions to individual conflicts can be recorded by using git-rerere, which automatically resolves known conflicts in case of recurrence. The branches to be integrated should nevertheless have already been restructured and cleaned up appropriately prior to an umpf.

Generating a Linear Series

A useries created as described above can now be used to derive an utag and thus unify all registered development branches into a common release tag:

~/epic-project $  cat ./useries

# umpf-base: v6.3
# umpf-name: 6.3/special-customer-release
# umpf-topic: v6.3/topic/bugfix-branch
# umpf-topic: v6.3/topic/more-fixes
# umpf-end

~/epic-project $  umpf tag ./useries

umpf will now stack all branches in the order they are mentioned in the useries onto the specified umpf-base. An autosquash is also performed so that fixup commits are resolved and do not end up in the final patchstack.

An example of how such an utag is represented in Git can be seen in the figure below. This is a so-called qualified umpf, since it contains all the necessary information for complete reconstruction:

Complete umpf — A "qualified" *umpf* tag (*utag*) as seen by Git.

umpf-base: The base commit, commonly an upstream tag.
Patch Stack: Commits of the topic branches stacked on the umpf-base. The order of the branches is the same as they were passed to umpf.
References: The state of the integrated branches at the time of the umpf. This state is referenced in the utag's commit via the branches' HEADs and thus preserved. Consequently, even a subsequent change to a branch has no effect if the utag is checked out again at a later time. The referencing also causes the Git Garbage Collection to leave the supposedly orphaned branch states in the repository. This way, that very constellation represented by the utag can be re-visited at any point in the future.
utag: A regular Git tag containing references to the underlying branches and their latest commit IDs (HEADs) at the time of umpfing.
Release Tag: Another regular Git tag. This has to be referenced for generating a patch series. The commit may perform a modification to the project code in order to include a timestamp.

The final export of the patch stack can be done with the umpf format-patch command. The export can be varied with parameters: An appended -bb creates a patch series compatible to Yocto projects, -p specifies the target path, and with -u the overwriting of an already existing series can be initiated. Further hints are provided by umpf --help.

umerge

Instead of an utag, we can also build an umerge without an existing useries step by step by calling umpf merge and manually inserting the individual topic branches. To do this, first use to use git checkout in order to switch to the desired umpf-base and then use umpf merge to load the desired branches:

~/epic-project $  git checkout v6.3
~/epic-project $  umpf merge v6.3/topic/bugfix-branch

umpf: merging 'v6.3/topic/bugfix-branch'...
[...]

~/epic-project $  umpf merge v6.3/topic/more-fixes

umpf: merging 'v6.3/topic/more-fixes'...
[...]

This merging of multiple branches is similar to an Octopus Merge, which can be used in Git for the simultaneous merging of more than two branches. Unlike Git, but like with an utag however, the origin of the added commits is preserved by an umpf merge: The underlying branches are referenced in their state at the time of the umerge by an additional entry in the commit message of the merge. This allows the origin of the individual commits - i.e. the original names of their branches as well as their history - to be traced at any time.

A umerge can also be created directly from a useries file by calling umpf build ./useries

An umerge is particularly suitable for iterative development, in which all topic branches of a project are required - e.g. for hardware-related kernel development. Additional changes to the source code can then be simply put on top of the umerge and be tested. Afterwards, by calling umpf distribute, the newly created patches can be interactively assigned to one of the original branches and will become integrated there by umpf directly.

umpf distribute can also be used if the commits were not stacked on an umerge, but on an utag instead.

*umpf* Stacking Feature — Dependencies between the topic branches of a single *useries* can be realized with *umerge*.

Furthermore, umerges can also be used directly in projects if there are dependencies between the individual topic branches. Through the Stacking Feature of umpf, the branch based on an umerge will be regularly integrated into the linearized sequence of commits during subsequent processing. It should be noted that the entry of the branch in question in the useries file must not be inserted before the branches on which it depends; the order of naming is decisive here.

Since the umerge preserves the state of the base branches the new branch relies on, subsequent work on those bases will logically not be carried over to an already existing umerge. This can lead to conflicts when a new utag shall be generated with an older umerge included. In such a case, changes done to the base branches must not affect direct dependencies of the dependent branch.

Why We Are umpfing

At Pengutronix, we have specialized in Embedded Linux, and are thus mainly working on the Linux kernel and associated projects. As part of our "mainline first" strategy, we bring the functionalities developed in our projects into the official upstream channels as early as possible. While this comes with many long-term benefits - such as ensuring the high quality standard of our code through community review - it also means that the resulting patches must not only work when directly integrated into our BSPs, but must also meet the fluctuating upstream situation. This requires a clean separation of the respective topic branch from the rest of the project.

The impetus for the development of umpf ultimately came from our work on driver development for embedded systems: Today, modern SoCs cover an incredibly broad spectrum of implementations and functionalities; at the same time, SoC components of identical design (so-called "intellectual property cores" or "IP cores") are often used in a modular manner - even by different manufacturers. Consequently, we have built up specific driver branches that can be used across several of our integration projects without further adaptation needed.

Working directly on the Linux kernel with its hundreds of branches and its continuous upstream development is of course a prime example of how extensive a project managed with Git can become. But even much smaller projects can benefit from automation aids like umpf once more than one branch is used.

Although an extension like our Universal Magic Patch Functionator might at some point have limitations when it comes to the actual possible complexity a Git repository can get to, umpf has nevertheless successfully enabled us for years to manage a whole range of different patch stacks for a large number of projects and to keep the general versioning effort relatively low in this regard.

We have now decided to make this tool available under MIT license to the open source community - and as always, we look forward to you participation!

Texts and photos are licensed under CC-BY-SA, except explicitly stated otherwise

Further Readings

Managing Complexity with Open Source

Robert Schwebel | 2024-08-29 | opensource, ptx history

A few days ago, something exciting happened: I revisited my very first embedded system - a 34 year old stepper motor controller, driving the telescope mount of the Public Observatory Rothwesten, which was built by me back when I was in class 12 in highschool. Comparing those embedded systems from back in the days with the recent industrial systems, it is impressive to see that the latter ones are not manageable any more without the use of open source software.

Pengutronix at the Linux Plumbers Conference

Chris Fiege | 2024-08-14 | conference, linux

The Linux Plumbers Conference 2024 will take place in Vienna from 18. to 20.09.2024. Luckily this does not overlap with the ELCE. Pengutronix will attend the LPC with six colleagues - so watch out for our T-shirts and hoodies and and feel free to chat with us.

Pengutronix at FrOSCon 2024

Chris Fiege | 2024-07-24 | conference, event, froscon, opensource

On August 17th and 18th, 2024, it's that time again: FrOSCon will take place at the Bonn-Rhein-Sieg University of Applied Sciences in Sankt Augustin - and Pengutronix will be there again as a Partner.

Pulse Width Modulation (PWM) is easy, isn't it? - Turning it off and on again

Johannes Zink | 2023-03-14 | pwm, fosdem, linux

Part of Uwe Kleine-König's work at Pengutronix is to review PWM (Pulse Width Modulation) drivers. In addition, he also sometimes refactors existing drivers and the Linux kernel PWM subsystem in general.

Yes we CAN... add new features

Marc Kleine-Budde, Johannes Zink | 2023-01-30 | CAN, DidYouKnow, linux-automation, OpenSource

Have you ever experienced an otherwise fine product that is missing just the one feature you need for your application?

Pengutronix at Embedded World 2022

Chris Fiege, Marie Mann, Jan Lübbe, Enrico Jörns | 2022-06-21 | labgrid, linux, linux-automation, opensource, rauc

Welcome to our booth at the Embedded World 2022 in Nürnberg!

Pengutronix Kernel Contributions in 2021

Robert Schwebel | 2022-01-01 | kernel, linux

2022 has started, and although Corona had a huge impact on our workflow, the Pengutronix team again made quite some contributions to the Linux kernel. The last kernel release in 2020 was 5.10, the last one in 2021 was 5.15, so let's have a look at what happened in between.

Pengutronix at FOSDEM 2021

2021-01-28 conference, kernel, labgrid, talk, testing

"FOSDEM is a free event for software developers to meet, share ideas and collaborate. Every year, thousands of developers of free and open source software from all over the world gather at the event in Brussels. In 2021, they will gather online." -- FOSDEM

Optimising tig's bash completion by a factor of 1000

Roland Hieber | 2019-10-18 | git, tig, bash, small paper cuts

Has this ever happened to you?

15 Years of i.MX in Mainline Linux

Robert Schwebel | 2019-08-14 | Kernel, Mainline Linux, i.MX

Today it has been 15 years since we mainlined support for Freescale/NXP's i.MX architecture in the Linux kernel! That was one small step for [a] man, one giant leap for (industrial Linux users') mankind :-) Here is some background about why it happened and what you might want to learn from history for your next embedded Linux project.