Automatic git conflict resolution on logs and sets

TL;DR: In this post, I describe how to configure git to use scripts that automatically resolve conflicts on files where they don't matter: log files, that are chronologically ordered, and set files, where only the set of lines matters and not the order.

I use git to version many things, from papers to code to scripts to configuration. For several of these projects, I am the only user, and I mostly use git to synchronize things across machines. Conflicts then become something of a nuisance; while they can be avoided by always pulling before editing, this is not always possible, e.g., when I'm offline, or forget to do this. However, there are files on which conflicts do not matter, and are easy to solve:

One example are log files, i.e., files that log timestamped events, one per line. For such files, we can reconcile conflicts by merging events, intuitively sorting the lines by timestamp and deleting all conflict markers. I use this for a log of personal notes, but the same should work if you want to version, e.g., your bash history.
Another example are set files, i.e., files that describe a set, each line being an item, and with irrelevant order between the lines (and no duplicates). One example are vim spellchecking additions, when synchronizing them across machines. For such files, intuitively, we can solve conflicts by discarding duplicate lines and dropping conflicts markers. If you are only adding lines to files, and you do not care about discarding duplicates, then you can use the predefined union merge strategy, as explained later.

I used to solve these conflicts by hand, but this was tedious and error-prone. I then realized that I could use custom merge drivers with git to automate this away. I have been using the setup for some months now without issues.

Set files

Let me start with this case, because it is simpler, and let me present things top-down. We will first add a file .gitattributes to our repository to indicate that a custom merge strategy should be used for some files. For instance:

cd myreporoot/
cat > .gitattributes <<EOF
mysetfile1.txt merge=set
mysetfile2.txt merge=set
EOF

Of course, you should then version this file with git:

git add .gitattributes
git commit -m 'automatic merges' .gitattributes

Now, we have to tell git what we mean by the set merge strategy. This is explained in the section "Defining a custom merge driver" in the gitattributes documentation or the manpage gitattributes(5), but I will summarize it here. Edit your .gitconfig file to register the set merge strategy:

cat >> ~/.gitconfig <<EOF
[merge "set"]
  name = set merger
  driver = ~/bin/git-merge-set %O %A %B %L
EOF

Now we have to create the ~/bin/git-merge-set script. This is fairly easy to do, once you have understood the meaning of the arguments that git passes to the program. Here is for instance my git-merge-set script, which concatenates the files and deletes duplicates (in a stable way, i.e., it preserves the order in the input files). Note that it depends on sponge from moreutils.

You should now be able to commit in your repository, pull conflicting changes as you like, and never hear about the conflicts on the files for which you have defined custom strategies. When pulling, git will just tell you that it is merging the changes, and everything will work fine.

If you are only adding lines, and you do not care about removing duplicate entries, you can use the predefined union merge strategy by simply writing merge=union in the .gitattributes file (and skipping the rest of the instructions). This merge driver will simply add the lines while choosing some order on the insertions.

Log files

For log files, it works exactly the same way, replacing "set" by "log" in all steps above, except that you have to define a different merge strategy. The script to write depends on the format of your log entries. Mine have a numerical timestamp, a space, and the line contents, and I use this git-merge-log script. Feel free to adapt it.