Automatic git conflict resolution on logs and sets
TL;DR: In this post, I describe how to configure git to use scripts that automatically resolve conflicts on files where they don't matter: log files, that are chronologically ordered, and set files, where only the set of lines matters and not the order.
I use git to version many things, from papers to code to scripts to configuration. For several of these projects, I am the only user, and I mostly use git to synchronize things across machines. Conflicts then become something of a nuisance; while they can be avoided by always pulling before editing, this is not always possible, e.g., when I'm offline, or forget to do this. However, there are files on which conflicts do not matter, and are easy to solve:
- One example are log files, i.e., files that log timestamped events, one per line. For such files, we can reconcile conflicts by merging events, intuitively sorting the lines by timestamp and deleting all conflict markers. I use this for a log of personal notes, but the same should work if you want to version, e.g., your bash history.
- Another example are set files, i.e., files that describe a set, each line
being an item, and with irrelevant order between the lines (and no duplicates).
One example are vim
spellchecking additions, when synchronizing them across
machines.
For such files, intuitively, we can solve conflicts by discarding duplicate
lines and dropping conflicts markers. If you are only
adding lines to files, and you do not care
about discarding duplicates, then you can use the predefined
union
merge strategy, as explained later.
I used to solve these conflicts by hand, but this was tedious and error-prone. I then realized that I could use custom merge drivers with git to automate this away. I have been using the setup for some months now without issues.
Set files
Let me start with this case, because it is simpler, and let me present things
top-down. We will first add a file .gitattributes
to our repository to
indicate that a custom merge strategy should be used for some files. For
instance:
cd myreporoot/
cat > .gitattributes <<EOF
mysetfile1.txt merge=set
mysetfile2.txt merge=set
EOF
Of course, you should then version this file with git:
git add .gitattributes
git commit -m 'automatic merges' .gitattributes
Now, we have to tell git what we mean by the set
merge strategy. This is
explained in the section "Defining a custom merge driver" in the gitattributes
documentation or the manpage
gitattributes(5), but I will summarize it here. Edit your .gitconfig
file to
register the set
merge strategy:
cat >> ~/.gitconfig <<EOF
[merge "set"]
name = set merger
driver = ~/bin/git-merge-set %O %A %B %L
EOF
Now we have to create the ~/bin/git-merge-set
script.
This is fairly easy to do, once
you have understood the meaning of the arguments that git passes to the program.
Here is for instance my git-merge-set script, which
concatenates the files and deletes duplicates (in a stable way, i.e., it
preserves the order in the input files). Note that it depends on sponge
from
moreutils.
You should now be able to commit in your repository, pull conflicting changes as you like, and never hear about the conflicts on the files for which you have defined custom strategies. When pulling, git will just tell you that it is merging the changes, and everything will work fine.
If you are only adding lines, and you do not care about
removing duplicate entries, you can use the predefined union
merge strategy by
simply writing merge=union
in the .gitattributes
file (and skipping the rest
of the instructions). This merge driver will simply add the lines while choosing
some order on the insertions.
Log files
For log files, it works exactly the same way, replacing "set" by "log" in all steps above, except that you have to define a different merge strategy. The script to write depends on the format of your log entries. Mine have a numerical timestamp, a space, and the line contents, and I use this git-merge-log script. Feel free to adapt it.