a3nm's blog

Minifying files with Delta

— updated

In many situations, e.g., when filing bug reports or asking questions on mailing-lists or forums, one needs to take a file which triggers a certain behavior and reduce it to a file of minimal size that still triggers the behavior. For instance, you have written a long program that makes your compiler segfault, and you want to extract from it a minimal program that does the same. This is called minification, and the minimal file is often called a minimal working example.

You can minify your file by hand, testing again each time you remove something, but this is quite inefficient. This post is a brief tutorial on how to use the tool Delta, which does this automatically.

First, you should install Delta. On Debian systems, it is packaged as delta, and its main command, that we will use, is named singledelta.

Second, the interesting part, you should create a shell script test.sh that takes a file as parameter and decides whether this file triggers the behavior of interest, returning 0 if the file is interesting and 1 if it is not. singledelta will use this script to test intermediate versions of the file while minifying.

For instance, to detect a segfault:

#!/bin/bash
myprogram --option "$1"
if ! test $? = 139; then
  exit 1
fi
exit 0

To test whether the output matches the contents of file "reference":

myprogram --option "$1" > output
! diff output reference

To test if the standard output or standard error contain the string "problem":

myprogram --option "$1" 2>&1 | grep problem

Third, you just copy your original file to a different name, say "minified_file", then run singledelta, which will minify "minified_file" in-place.

cp original_file minified_file
singledelta -in_place -test=./test.sh minified_file

The process is very chatty. Once it completes, "minified_file" is a file that still triggers the behavior and is as small as possible.

Well, technically, this is not true, because I have observed that in some cases, for reasons unknown, rerunning singledelta again on the supposedly minified file can minify it further. I have written a trivial script to run singledelta repeatedly until the file no longer shrinks. Use it thus:

cp original_file minified_file
manydelta ./test.sh minified_file

Once again, this will minify "minified_file" in place by invoking singledelta repeatedly. Of course, once the process has completed, you may still be able to apply human intelligence to minify the file further in ways that singledelta cannot do. Indeed, singledelta only tries to remove lines, it will not, e.g., shorten identifiers or strings.

If you need more advanced features, Delta can also be used for other things, e.g., running on multiple files. See for instance this guide.

comments welcome at a3nm<REMOVETHIS>@a3nm.net