a3nm's blog

Trying btrfs... and giving up for now

I recently tried to use btrfs to manage some data. My main use case was incremental backups relying on btrfs's snapshot feature. Indeed, btrfs allows you to:

  • create snapshots of your filesystem at various points in time: the snapshots essentially take no additional space, except that files of that FS will not really be deleted as long as they survive in a snapshot;
  • send snapshots to remote hosts, even computing efficiently the diff between each snapshot and the previous one, to minimize IO, bandwidth, and backup storage costs;
  • browse old snapshots seamlessly as if they were actual folders, and restoring files easily.

This is much better than my current solution, which is to use rsync. Indeed, by contrast, rsync has the following drawbacks:

  • rsync only synchronizes the current version (overwriting the previous one), and if you want to keep multiple versions they are not compressed relative to each other;
  • each rsync must rescan the entire filesystem, even if almost nothing has changed;
  • rsync is not always intelligent about transfers, as it tries to avoid re-sending files that haven't changed, but receives no help from the FS to understand what went on: for instance, if you move a large directory on the master, in most cases rsync will fail to notice and re-transfer the whole directory to the backup.

This post is a terse documentation of what I have learnt about btrfs in the process of exploring it. Sadly, the main outcome of my investigations is that btrfs does not seem sufficiently mature for me to use it yet. I am sorry about the negative conclusion: I think that btrfs is a great project and I imagine that the remaining rough edges will eventually be fixed. Further, the good news is that (as far as I can tell) I have only encountered crashes but I have not encountered any data loss issue.

General considerations about btrfs

So here are some general things about btrfs that I discovered when playing around:

  • btrfs supports transparent file compression with zlib and lzo. This is done by passing an option to mount. I am not too sure about what happens if you forget to pass this option (or pass the wrong value for this option). It seems to work fine, though.
  • btrfs supports deduplication, but it turns out that this did not mean what I thought it would.
    Unlike, e.g., git repositories, if you write data to the disk which happens to already exist someplace else, btrfs will not notice it and use it to share space. What it means is that btrfs supports copy-on-write, i.e., when you write data on the FS that comes from another file of the FS, then btrfs will only save a pointer to the old data, and will not create two different copies until one piece of data is modified.
    This implies that, if you want to deduplicate data which has not been created using copies, you need to do it offline with specific tools: btrfs does not support it out of the box. I tried bedup, which was quite slow; its savings amounted to 110 GB out of 2.6 TB of data when I tested it on a partition. (Of course, your mileage may vary.) It is quite worrying that the deduplication tools (in particular, bedup) do not seem very sure of what they are doing, so this does not give at all the impression of being robust.
  • btrfs supports many nice features that I didn't need: splitting a FS across multiple devices (with replication or not), adding/removing devices on the fly, performing resizes online, etc. I did not try these features out.

Here are things you will need to know when trying out btrfs, and traps in which you may fall:

  • The btrfs utilities are not shipped with Debian by default, you need to apt-get install btrfs-tools.
  • If you want to start playing with btrfs, you will probably want to convert data from an ext3 or ext4 partition. There is a tool designed to do that, btrfs-convert, but closer inspection reveals that it is now reported to be unreliable.
    As I didn't want to build the FS on shaky foundations, I created a partition from scratch, and moved my terabytes of data around.
  • When creating test filesystems, note that you cannot create btrfs filesystems that are too small (apparently, less than 100 MB), and you will get a confusing error message if you try.
  • btrfs exposes quite a lot of its internals which you apparently may need to be aware of. In particular, you may have to defragment it1. It seems that you may also need to balance the filesystem (amongst other things) to avoid problems when it is full
  • btrfs makes it possible to have subvolumes which you can mount independently. In other words, if your disk contains games and music, you could imagine having a subvolume games/ and a subvolume music/, and mounting only one of the two (or mounting them at different endpoints). In this case, if you mount the root of the filesystem, games/ and music/ will appear as folders (which are actually different filesystems).
    This means that you should be careful when starting to organize your filesystem: the root of the filesystem doesn't play the same role as in other filesystems, and you should probably always be mounting a subvolume of it instead. If you miss this point initially and want to change your mind later, it's not so simple.
  • While btrfs supports copy-on-write, cp will not use it by default. You need to pass the option: --reflink=always to cp, as explained in this FAQ entry. This is a bit unpleasant because it means that scripts must be using cp properly to take advantage of copy-on-write, and that other programs will not necessarily support it. In particular, rsync does not, for now.

Incremental backups: snapshotting, sending, receiving

Now, here is more about my specific experience with subvolumes, snapshots, and btrfs send and btrfs receive, which were the main features I was interested in. In summary, here are the basic principles:

  • You can run btrfs subvolume snapshot foo/ snap/ to create a snapshot of foo/ as snap/. This creates snap/ as a folder (but it's actually a different subvolume), which contains the contents of foo/ (using copy-on-write, so without duplicating the actual contents on disk). For backups, you want to create read-only snapshots (btrfs subvolume snapshot -r).
    If you create snapshots at different points in time, you do not need (and cannot) tell btrfs subvolume snapshot which ones are the most recent: however, for your own purposes, you should probably indicate it in the filename.
    You can be quite trigger-happy with snapshots, I created one every hour for weeks without any problem.
  • You can run btrfs send snap/ to produce on standard output a representation of the (read-only) snapshot snap/. Alternatively, you can run btrfs send -p old_snap/ snap/ to prepare efficiently a compressed representation of snap/ that relies on old_snap/. I tested that, indeed, when the difference from old_snap/ to snap/ is that a very large folder was moved, btrfs send is able to create a very concise representation in comparatively little time.
  • You can run btrfs receive snapshots/, where snapshots/ is in the backup FS, to read on standard input a dump produced by btrfs send, and create in snapshots/ the corresponding snapshot (here, snap/: the name depends on what btrfs send is sending). Of course, the backup FS can be on a different machine: you can pipe the stream across ssh, or simply store it to a file and move that from one machine to the other.

That's the theory. Now, details and traps. First, about snapshot creation:

  • When creating snapshots periodically, it is quite easy to end up with filesystems with a very large number of files (which are very similar copies of the same hierarchy). This is very undesirable, e.g., for locate: I had updatedb wasting lots of CPU and disk space indexing a large number of these snapshots and polluting my locate results. You'll want to tell updatedb not to explore the snapshot folder, using the setting PRUNEPATHS in /etc/updatedb.conf.
  • In terms of access rights, you do not need to be root to create a snapshot (or subvolume). Indeed, if you couldn't read some files in the source, you will still be unable to read them from the snapshot.
    However, deleting subvolumes is not possible as an unprivileged user unless you pass a specific mount option: I am not sure of the implications of this, in particular, I do not know why it is not the default. Further, deleting subvolumes that were created to be read-only requires a specific step to make them writable.
    Another thing to understand is that, to remove a subvolume, whether as root or otherwise, using rm will fail with Operation not permitted; a different error than the usual Permission denied, but a possible source of confusion. You should be using btrfs subvolume delete instead.
  • Having snapshots also makes it quite complicated to understand where your disk space is going. Is it used by files currently in your FS? Or files deleted in the FS but retained because of some snapshot? If so, which snapshot(s)? How many space would you reclaim by removing a given snapshots, or, say, all snapshots older than one month?
    To answer such questions, you need to use (in particular, enable) btrfs's quota support. But even then it is not very obvious to figure all of this out.

About sending and receiving snapshots:

  • btrfs send requires root, even for snapshots that you created: this is unsurprising, as remember that you can snapshot files that you cannot read, and of course you shouldn't be able to read them from the output of btrfs send
  • You should not interrupt btrfs send or btrfs receive, either with SIGSTOP or by putting the computer in hibernation mode. If you do so, the operation will fail. In particular, an incomplete copy of the subvolume will stay around on the receiving end, which can easily mislead you and make you believe that the operation succeeded. Apparently, btrfs is smart enough to notice that the copy is incomplete (in particular, fortunately, refusing to use it as a parent to reconstruct another snapshot), but it is not sufficiently intelligent to delete the leftover files or (preferably) resuming the operation from where it left, like rsync does. This means that, in practice, you probably want to snapshot often and have relatively small diffs between snapshots.
    Also note that btrfs send and btrfs receive give no progress information when they run.
  • Once you have created snapshots and you want to transfer them to the backup host, the problem is figuring out which backup depends on which, and what to send. You can only choose this at the level of btrfs send: snapshot creation does not need a parent, and btrfs receive is apparently able to use some ID specified in the btrfs send invocation to identify which volume it should use (or fail if a suitable volume does not exist, although I don't know whether this check is bulletproof or not).
  • Hence, when sending snapshots, btrfs leaves you free to choose the right set of send operations with the right parents to minimize IO and network cost.
    A program called buttersink attempts to do this, i.e., choosing an intelligent sequence of transfers. For my use case, sadly, it did not work. This is pretty surprising, as my case is quite simple: a series of chronological snapshots, each of which should be sent based on the previous one. Maybe the reason is that buttersink does not know in which order the snapshots were made, and relies on a size estimation of the diff between two btrfs snapshots, which apparently is both slow to compute and wildly inaccurate.
    So I wrote instead a much simpler script which order the snapshots by date (as indicated in their name) and sends them in that order. There are probably exist more elaborate tools for that purpose, like btrbk which I did not test.

Messy problems

And finally, here are the nasty problems I ran into. When running my script to perform the transfers, and disconnecting hard drives at random points to simulate messy hardware failures, I observed the following:

  • Backtraces in syslog suggesting a problem with btrfs (even during normal operation, I think):
kernel: [52053.405416] ------------[ cut here ]------------
kernel: [52053.405456] WARNING: CPU: 0 PID: 12046 at /build/linux-HoPide/linux-4.5.1/fs/btrfs/qgroup.c:2650 btrfs_qgroup_free_meta+0x88/   0x90 [btrfs]()
kernel: [52053.405459] Modules linked in: ufs(E) qnx4(E) hfsplus(E) hfs(E) minix(E) ntfs(E) vfat(E) msdos(E) fat(E) jfs(E) xfs(E)          libcrc32c(E) crc32c_generic(E) vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) veth(E) ebtable_filter(E) ebtables(E) xt_conntrack(E) ipt_MASQUERADE(E)  nf_nat_masquerade_ipv4(E) xt_addrtype(E) br_netfilter(E) bridge(E) stp(E) llc(E) overlay(E) pci_stub(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E)   sunrpc(E) fuse(E) ip6t_REJECT(E) nf_reject_ipv6(E) ip6table_filter(E) ip6_tables(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E)      nf_nat(E) nf_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_tcpudp(E) xt_owner(E) xt_multiport(E) iptable_filter(E) ip_tables(E) x_tables(E) binfmt_misc(E)  quota_v2(E) quota_tree(E) dm_crypt(E) algif_skcipher(E) af_alg(E) snd_hda_codec_hdmi(E) uas(E) usb_storage(E) iTCO_wdt(E) iTCO_vendor_support(E) ppdev(E)     intel_rapl(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) hmac(E) dm_mod(E) drbg(E) ansi_cprng(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) uvcvideo(E) pcspkr(E)                 snd_hda_codec_realtek(E) sg(E) serio_raw(E) snd_hda_codec_generic(E) videobuf2_vmalloc(E) i2c_i801(E) i915(E) videobuf2_memops(E) videobuf2_v4l2(E)           videobuf2_core(E) drm_kms_helper(E) snd_usb_audio(E) videodev(E) snd_hda_intel(E) media(E) snd_hda_codec(E) snd_usbmidi_lib(E) snd_rawmidi(E) snd_hda_core(E) snd_seq_device(E) snd_hwdep(E) snd_pcm_oss(E) drm(E) snd_mixer_oss(E) snd_pcm(E) snd_timer(E) evdev(E) joydev(E) lpc_ich(E) snd(E) mei_me(E) cdc_acm(E)       mfd_core(E) i2c_algo_bit(E) soundcore(E) mei(E) shpchp(E) 8250_fintek(E) battery(E) parport_pc(E) parport(E) video(E) soc_button_array(E) tpm_infineon(E)     tpm_tis(E) button(E) tpm(E) processor(E) it87(E) hwmon_vid(E) coretemp(E) loop(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) xor(E) raid6_pq(E)  sr_mod(E) cdrom(E) sd_mod(E) hid_generic(E) usbhid(E) hid(E) crc32c_intel(E) ahci(E) libahci(E) r8169(E) psmouse(E) libata(E) xhci_pci(E) ehci_pci(E)         xhci_hcd(E) ehci_hcd(E) scsi_mod(E) mii(E) usbcore(E) usb_common(E) fan(E) thermal(E) fjes(E) [last unloaded: vboxdrv]
kernel: [52053.405581] CPU: 0 PID: 12046 Comm: rsync Tainted: G        W  OE   4.5.0-1-amd64 #1 Debian 4.5.1-1
kernel: [52053.405583] Hardware name: Gigabyte Technology Co., Ltd. H87M-HD3/H87M-HD3, BIOS F3 05/09/2013
kernel: [52053.405585]  0000000000000286 000000008e92a6d5 ffffffff81307b65 0000000000000000
kernel: [52053.405589]  ffffffffc02f15b0 ffffffff8107905d ffff8800a959e800 0000000000004000
kernel: [52053.405592]  ffff8800a959e800 0000000000004000 0000000000000002 ffffffffc02cfaf8
kernel: [52053.405595] Call Trace:
kernel: [52053.405605]  [<ffffffff81307b65>] ? dump_stack+0x5c/0x77
kernel: [52053.405610]  [<ffffffff8107905d>] ? warn_slowpath_common+0x7d/0xb0
kernel: [52053.405630]  [<ffffffffc02cfaf8>] ? btrfs_qgroup_free_meta+0x88/0x90 [btrfs]
kernel: [52053.405650]  [<ffffffffc0268702>] ? start_transaction+0x3e2/0x4a0 [btrfs]
kernel: [52053.405668]  [<ffffffffc026e507>] ? btrfs_dirty_inode+0x97/0xc0 [btrfs]
kernel: [52053.405672]  [<ffffffff81205538>] ? touch_atime+0xa8/0xd0
kernel: [52053.405676]  [<ffffffff8116d7bd>] ? generic_file_read_iter+0x63d/0x790
kernel: [52053.405681]  [<ffffffff811ee2b1>] ? cp_new_stat+0x151/0x180
kernel: [52053.405683]  [<ffffffff811e8913>] ? new_sync_read+0xa3/0xe0
kernel: [52053.405686]  [<ffffffff811e9101>] ? vfs_read+0x81/0x120
kernel: [52053.405689]  [<ffffffff811ea072>] ? SyS_read+0x52/0xc0
kernel: [52053.405693]  [<ffffffff815b6ab2>] ? system_call_fast_compare_end+0xc/0x67
kernel: [52053.405695] ---[ end trace 6c76a866f1f3e28c ]---
kernel: [52053.790081] ------------[ cut here ]------------
  • At one point, when I disconnected a hard drive that contained a mounted btrfs system, an instant hard reset of the machine (!).
  • Messed up filesystems where some operations would take apparently forever (e.g., subvolume delete, on the target of the transfer), during which mysterious processes like btrfs-cleaner and btrfs-transaction were performing varying levels of CPU/IO, and the lagging operation could not be aborted with SIGINT. I saw no way to find out what the processes were trying to do.
  • Even weirder filesystems with which the entire machine started being unresponsive and OOM-ing for unclear reasons, around 2 hours after they had been mounted. I eventually had the idea of checking slabtop, which showed that the kernel was filling the RAM (8 GB) with its own structures, presumably because of some sisyphean operations that btrsync was currently undertaking on them.
  • While the above happened, in other cases, it happened to me that syslog got flooded with messages about btrsync, filling up my root partition.

This is where I give up: even though I would very much like to have incremental backups at the FS level, for now, I do not feel comfortable handing over my data to a FS that suffers from this kind of problems. I know that, in principle, I should try to report the bugs to developers and help fixing these issues, but sadly I do not feel I can invest the time and effort to help debug an FS before I can use it. Note that I did not even do very ambitious things: essentially just snapshots, send, and receive, randomly disconnecting the devices at various stages of the process. So maybe it could be straightforward to reproduce the problems I ran into.

So I'm back to rsync for now, and I'll have to investigate incremental backup programs that are smarter than rsync but do not rely on collaboration from the FS, e.g., Borg. Or maybe I could try ZFS...


  1. It's very funny to hear that btrfs must be defragmented when you have heard for years the propaganda "only Microsoft file systems must be defragmented, because they are inferior"... 

A coat-of-arms from pixel art

My personal logo, which I use as a favicon on my website and as an avatar, is a pixel art representation of an 'a' in a weird frame that spells out "3nm" in all directions:

a3nm logo

Its size is 16 pixels by 16 pixels.

I like coats of arms, so I thought that it would be fun to try to draw one from my logo. The idea that I had is to use quartering to draw the pixel art logo recursively using standard heraldic designs.

For starters, let us git rid of the messy part at the bottom of the shield to get a nice square drawing area:

empty coat of arms

This is described as follows in French (I will only be using French descriptions as they are the only ones I know about):

D'argent à la champagne d'azur.

In the drawing above, I used a non-standard color for azure to match the pixel art logo, and I used a modern shield to get a more reasonable drawing area. I hope you don't mind. The fact that azure is a colour and silver is a metal will ensure that we do not violate the rule of tincture.

Now, let us describe the actual pixel art logo, using recursive quarterings. A complication is that simpler forms should be correctly described when we reach them, for instance, a 4x4 design should be described as "parti" or "coupé" divisions if applicable, and if only one pixel is drawn differently from the three others, it is better described as a quarter, or "franc-quartier", whose position must be specified.

I used a small program to generate the correct description, and verified it by drawing the resulting coat of arms by hand. Here it is, in French:

Grand-écartelé : 
  au premier, écartelé : 
    au premier, contre-écartelé : 
      au premier, parti d'argent et d'azur,
      au deuxième, parti d'azur et d'argent,
      au troisième d'azur au franc-quartier d'argent à senestre,
      au quatrième d'azur au franc-quartier d'argent,
    au deuxième, contre-écartelé : 
      au premier d'azur,
      au deuxième, parti d'argent et d'azur,
      au troisième, coupé d'argent et d'azur,
      au quatrième d'azur au franc-quartier d'argent,
    au troisième, contre-écartelé : 
      au premier, coupé d'argent et d'azur,
      au deuxième, parti d'argent et d'azur,
      au troisième, coupé d'azur et d'argent,
      au quatrième d'azur au franc-quartier d'argent,
    au quatrième, contre-écartelé : 
      au premier d'argent au franc-quartier d'azur,
      au deuxième, coupé d'argent et d'azur,
      au troisième d'azur au franc-quartier d'argent à senestre et en pointe,
      au quatrième, coupé d'azur et d'argent,
  au deuxième, écartelé : 
    au premier, contre-écartelé : 
      au premier, parti d'argent et d'azur,
      au deuxième, parti d'azur et d'argent,
      au troisième d'azur au franc-quartier d'argent à senestre,
      au quatrième, coupé d'argent et d'azur,
    au deuxième, contre-écartelé : 
      au premier d'azur au franc-quartier d'argent à senestre et en pointe,
      au deuxième, coupé d'argent et d'azur,
      au troisième d'azur au franc-quartier d'argent à senestre,
      au quatrième, coupé d'azur et d'argent,
    au troisième, contre-écartelé : 
      au premier, coupé d'argent et d'azur,
      au deuxième d'argent au franc-quartier d'azur à senestre,
      au troisième, coupé d'azur et d'argent,
      au quatrième d'argent,
    au quatrième, contre-écartelé : 
      au premier, parti d'azur et d'argent,
      au deuxième d'azur,
      au troisième d'azur au franc-quartier d'argent à senestre,
      au quatrième, coupé d'argent et d'azur,
  au troisième, écartelé : 
    au premier, contre-écartelé : 
      au premier, coupé d'azur et d'argent,
      au deuxième d'azur au franc-quartier d'argent en pointe,
      au troisième d'azur,
      au quatrième, parti d'argent et d'azur,
    au deuxième, contre-écartelé : 
      au premier d'argent,
      au deuxième d'azur,
      au troisième d'argent au franc-quartier d'azur en pointe,
      au quatrième, coupé d'azur et d'argent,
    au troisième, contre-écartelé : 
      au premier, coupé d'argent et d'azur,
      au deuxième d'azur au franc-quartier d'argent en pointe,
      au troisième, coupé d'azur et d'argent,
      au quatrième d'azur au franc-quartier d'argent,
    au quatrième, contre-écartelé : 
      au premier, coupé d'azur et d'argent,
      au deuxième d'azur au franc-quartier d'argent en pointe,
      au troisième, parti d'argent et d'azur,
      au quatrième, parti d'azur et d'argent,
  au quatrième, écartelé : 
    au premier, parti :
      au premier, coupé :
        au premier d'azur,
        au deuxième, écartelé d'azur et d'argent,
      au deuxième d'argent,
    au deuxième, contre-écartelé : 
      au premier d'azur au franc-quartier d'argent à senestre et en pointe,
      au deuxième, coupé d'argent et d'azur,
      au troisième, parti d'azur et d'argent,
      au quatrième, coupé d'azur et d'argent,
    au troisième, contre-écartelé : 
      au premier d'azur au franc-quartier d'argent à senestre et en pointe,
      au deuxième, coupé d'azur et d'argent,
      au troisième, parti d'azur et d'argent,
      au quatrième d'azur,
    au quatrième, contre-écartelé : 
      au premier d'azur au franc-quartier d'argent à senestre et en pointe,
      au deuxième d'azur au franc-quartier d'argent en pointe,
      au troisième, parti d'argent et d'azur,
      au quatrième, parti d'azur et d'argent ;
à la champagne d'azur.

You can use the indentation to recursively match indented blocks of the code above to squares following a kind of quadtree decomposition. Here is the result:

my coat of arms

So my coat of arms is drawn from a pixel art logo, and has a textual description which should hopefully follow (the letter of) the rules of the art, and should allow any (sufficiently motivated) herald to reconstruct the design from the text alone.

An direction for future work would be to design a general program to convert arbitrary pixel art designs to textual descriptions of coats of arms. :) I thought about it a bit, but it's trickier than it sounds if you want to use the full power of known heraldic operators.

Readers interested by heraldics and computer science may also be interested by Pascal Manoury's work on drawing coats of arms. The task studied there is to produce the graphical description of a coat of arms given a (structured) textual description of its contents. Here is the article about this topic (in French), as well as slides (in French), and sample output (in French). Thanks to Olivier Roussel for pointing this out!

Encrypt email to known GPG users with mutt using crypt_opportunistic_encrypt

If you use mutt and GPG, you may want to say that messages sent to other GPG users should be encrypted by default, and others should not.

This used to be surprisingly hacky to do, with the most common solution apparently being a script that listed known GPG keys and added mutt hooks to enable them. This was ugly and it also didn't work well because it would try to encrypt messages as soon as some recipient supported GPG, even when all recipients did not.

This post advertises a recent solution to this problem: the crypt_opportunistic_encrypt setting of mutt, which was merged1 in mutt version 1.5.24 (the one currently in Debian testing). This setting allows you to do essentially what the hacky script did, but in a much cleaner and simple way, also fixing the problem I mentioned.

I am currently using the setting in my mutt config and I quite happy about it. Here are things to know when the setting is enabled:

  • The choice to encrypt or not encrypt the message is toggled whenever the recipients are edited, it's not only based on the initial recipients.
  • Encryption is chosen only if all recipients have a key.
  • You can always edit signing options or simply disable encryption with the usual pgp-menu command ('p'). You can also disable the opportunistic encryption setting for a single message in the pgp-menu and you can then fall back to configuring encryption in the usual way.
  • Encryption is enabled as soon as the recipients have a GPG key that looks reasonable (i.e., when I tested, it was not enabled for recipients where all keys were either disabled or expired), but there is no check2 to see whether the key is known to be valid. If a recipient has no trusted key, you will get the usual prompt to choose a key and confirm that you want to use the key even though its validity is unknown.
  • GPG seems to get invoked for the purposes of the option, so, whenever gpg decides to (presumably) check the trustdb, mutt may mysteriously hang. Just be patient.

  1. Thanks a lot to Kevin for developing and merging this! 

  2. I think that this is quite reasonable, because in practice active attacks with GPG are not a huge problem, much less than the problem that few people are using GPG. I think it especially makes sense in combination with GPG's recent support of trust on first use (TOFU) to make it less painful to use. 

Self-hosted, server-side MathJax

I use MathJax on my website to render math equations to HTML. The standard way to use MathJax is to use JavaScript so that visitors will get it from their CDN. While this is simpler and also a good idea for caching, it has drawbacks which I did not find acceptable:

  • It means that the MathJax CDN may be pinged whenever a visitor loads a page, which is bad for privacy.
  • It makes your website's security dependent on that of the CDN: if the CDN starts distributing malicious JS (i.e., MathJax turns evil or it gets hacked), then your visitors will be getting them.
  • It renders math in the browser using JavaScript. This is jarring (as the page jumps around while rendering is done), and I find it esthetically unpleasant. All of this website is static and pre-generated on my machine, I don't see why math rendering would be an exception. I find static websites preferable in terms of deployability, security, and elegance.

This post is just to explain how to render MathJax when generating static pages, using MathJax-node. As I wanted to play with Docker, and didn't want to install Node or run the code directly on my real machine, I will also explain how to set up the environment in a Docker container, but there's nothing forcing you to do that. :)

I am now using this setup on this website, so all math should now be served with server-side rendering, without requests to third-party CDNs. (In fact, this removes what was essentially the only use of JavaScript on this site.)

Setting up a mirror of the fonts

While we won't need to serve the MathJax JavaScript code to readers, we will need to serve them the fonts used by MathJax.

Fortunately, on Debian systems, these fonts are already packaged as fonts-mathjax and fonts-mathjax-extras, so you can just rely on the package manager to retrieve them and keep them up to date. The fonts are installed in /usr/share/javascript/mathjax/, so you just have to configure your Web server to serve this folder. I serve it as a3nm.net/mathjax. It's preferable to serve it from the same domain that you will otherwise use, otherwise it's necessary to jump through additional hoops because of the same-origin policy: see an explanation here.

Installing MathJax-node

I installed MathJax-node in a Docker image, and as I was paranoid I also generated my own base image for the underlying system. Feel free to simplify the instructions if you don't need to do any of this.

I'm using an amd64 Debian system. I installed Docker as docker.io (packaged with Debian), added myself to the docker group, logged out and logged in. I tweaked Docker by editing /etc/default/docker and symlinking /var/lib/docker to move its files to a partition with more disk space.

I created the base system by issuing the following. Note that we will need Debian testing to be able to run mathjax-node, as otherwise the packaged version of Node is too old:

mkdir testing
sudo debootstrap testing testing/
sudo tar -C testing/ -c . | docker import - my-debian-testing

Here is the Dockerfile:

FROM my-debian-testing:latest
RUN apt-get -y update && apt-get install -y npm nodejs-legacy && apt-get clean
RUN npm install mathjax-node
RUN npm install cssstyle
CMD ["bash"]

In the folder of the file, issue:

docker build -t my-mathjax .

You can now use the image by starting a container, let's call it my-container:

docker run -di --name=my-container my-mathjax bash >/dev/null

And you can then apply page2html by piping your HTML code in the following invocation:

docker exec -i my-mathjax node_modules/mathjax-node/bin/page2html \
    --fontURL https://a3nm.net/mathjax/fonts/HTML-CSS"

Replace the fontURL parameter by the URL you are serving the MathJax fonts.

Another possibility is to use page2svg to render the math to SVG instead of HTML markup. However, this means that text-based browsers will not be able to see it.

The actual code that I use is here. I also increase the size of math by adding the following CSS as indicated here:

.mjx-chtml {font-size: 2.26ex ! important}
.mjx-chtml .mjx-chtml {font-size: inherit ! important}

Marking up MathJax

I use Markdown to write Web pages. I use python-markdown-math to convert math notation from a dollar-based notation in Markdown to HTML spans with the right classes (class="tex" or class="math"). To use python-markdown-math with the server-side setup, simply prevent it from adding the MathJax script boilerplate. I also use the render_to_span config parameter to ensure that no script is being generated.

To prepare the MathJax afterwards, you should be careful that the command of the previous section needs to apply to the entire HTML document, not just the HTML markup generated from Markdown before applying a template. Indeed, you will see that modifies the head element as well.

Performance

On modern hardware with this setup, it takes about one second to process an HTML page (even when there is no math in it). It takes a few seconds to process HTML pages with real markup such as this one.

Here's an example formula to see how it works: it should look the same as any regular MathJax formula, but without the blinking caused by JavaScript rendering: π(I)=(FIπ(F))×(FJI(1π(F))).

Summary on (hyper)graph tractability parameters

For my PhD research, I had to investigate various notions of (hyper)graph tractability, such as treewidth, cliquewidth, pathwidth, etc. Here is a very terse summary of what I have found about them. It is mostly my personal summary of reading this survey and some other papers, but maybe it may interest others.

To give the broad context: computer scientists are interested in solving problems on graphs, e.g., given a graph, does it have a specific property (for instance, can it be drawn without edge crossings, can it be colored with 3 colors, etc.). Some of these problems are known to be intractable in the sense of complexity theory, which means that there is probably no fast algorithm to solve them. However, many of those hard problems can be solved efficiently in the specific case of graphs that are forests, i.e., that do not have cycles.

Graph theorists have tried to understand what makes trees tractable, and to extend the notion of acyclicity ("being a forest") in two broad respects. First, by identifying broader notions for graphs, which are parameterized: a parameter k indicates how much the graph deviates from acyclicity, and the notion should collapse to acyclicity for small k. The hope is then that, for fixed k (i.e., for graphs that are not-acyclic only by a small margin), we can still enjoy tractability, for instance in the sense of parameterized complexity. Second, theorists have tried to extend acyclicity from graphs to the more expressive formalism of hypergraphs, i.e., graphs where edges can connect an arbitrary number of vertices.

In all this post except where explicitly stated otherwise, we consider undirected graphs, i.e., a pair (V,E) of a set of vertices and a set of edges which are pairs of vertices.

Roadmap:

  • Section 1 presents treewidth, branchwidth (essentially equivalent), and pathwidth (more restrictive, i.e., a graph's pathwidth is more than its treewidth), and treedepth, which is more restrictive than pathwidth in the same sense.
  • Section 2 presents rankwidth and cliquewidth (equivalent, and less restrictive than the previous ones).
  • Section 3 discusses preservation under subgraphs, minors, and other operations.
  • Section 4 discusses extended definitions for hypergraphs.
  • Section 5 discusses the complexity of computing these parameters.
  • Section 6 discusses their applicability for the decidability of logical theories, the tractability of model checking and other such problems for expressive languages.
  • Section 7 studies their use for less expressive languages: the problems of CSP, query evaluation and homomorphisms.
  • Section 8 is an appendix with a remark about partial orders.

All links to scientific works are open-access, i.e., available without a subscription. I give them as direct links, but I also give the DOI of each link at the end.

Warning: Please be aware that this summary are my personal notes: it may contain errors and should not be taken at face value. Please let me know if you find any bugs.

Acknowledgements: Thanks to Pierre Senellart, Mikaël Monet and Robin Morisset for proofreading and comments.

Treewidth, pathwidth, branchwidth

Treewidth (tw)

Treewidth is the most common measure, and can be defined in multiple different ways. The treewidth of a graph G is the smallest k such that it admits a tree decomposition of width k in one of the following senses:

  • Formally: we can associate a tree T to G (called the tree decomposition), with each node of T (called bag) labeled by a subset of the graph vertices, such that:

    • for each edge of G, its endpoints co-occur in some bag of T
    • for each vertex of G, the nodes of T where it occurs form a connected subtree in T

    The width of the tree decomposition T is k1 where k is the maximal number of vertices in a bag of T.

    While tree decompositions are not oriented, it is often convenient to pick a root and see them as rooted trees.

  • As a decomposition scheme. Trees can be decomposed by picking a separator formed of two adjacent vertices, removing the edge between them, and decomposing recursively each connected component: in each component, you can pick a separator that includes the vertices of the parent separator which are also in that component.

    As a generalization, graphs of treewidth k can be decomposed by picking a separator of k+1 vertices, removing edges between them, and decomposing recursively each connected component, requiring that each component's separator includes the vertices of the parent separator which are part of the component. The tree of the separators chosen in this process is a tree decomposition of the instance, because the separators have the right size, cover all edges, and whenever a vertex occurs in a separator it will occur in all descendent separators until we reach a component where the vertex no longer occurs.

    Conversely, when looking at any non-root bag b of a tree decomposition T, considering the subgraph G1 of the decomposed graph formed of the vertices occuring in b and its descendants (the subtree Tb of b rooted at T), and G2 the subgraph of the vertices occurring in the other bags, the common vertices between G1 and G2 are those shared between b and its parent (often called an interface), and removing those in G disconnects G1 and G2 (any other path connecting them could not be legally reflected in the decomposition). So Tb can be recursively understood as a tree decomposition of G1, which has been disconnected from the rest of G.

    This explanation with pictures may help you understand what I mean.

  • As a divide and conquer scheme. Many problems on a rooted tree can be solved by a dynamic programming algorithm that solves the problem on any subtree given the solution to its child subtrees; this is usually just thought of as a bottom-up computation. In graphs of treewidth k, following the above scheme (or a rooted tree decomposition) gives you a way to solve problems on the graph with a similar divide-and-conquer dynamic programming scheme.

    Again, this is nicely covered by the explanation linked above.

  • As a pursuit-evasion game, where "cops" use a decomposition of the graph to corner and catch a "robber". For more details, see hlineny2007width, Section 5.7.

Forests have treewidth 1 (so "having treewidth 1" means "being acyclic"), cycles have treewidth 2, cliques with k vertices have treewidth k1, but grids illustrate that graphs may be planar but have a high treewidth: a k by k grid has treewidth k.

Pathwidth (pw)

Pathwidth is a variant of treewidth where the underlying tree in the formal definition is required to be a path graph rather than a general tree. It has many alternative characterizations, in particular in terms of overlapping intervals.

I won't be coming back to pathwidth later, but to contrast it with treewidth, for any graph G:

  • tw(G)pw(G): immediate as one notion is more restrictive than the other;
  • pw(G)=O(tw(G)log|G|): see bodlaender1998partial, Corollary 24.

So "having bounded pw" is strictly more restrictive than "having bounded tw".

Branchwidth (bw)

Branchwidth can be defined as follows. Let G=(V,E) be a graph. Given a partition of its edges E=E1E2, call its weight the number of vertices incident both to an edge in E1 and an edge in E2. A branch decomposition of width k of G is a subcubic tree (nodes have degree either 1 or 3) where each leaf is labeled bijectively by an edge of G, and for each edge of the tree, considering E1E2 where E1 and E2 are the edges that annotate each half of the tree, the weight of this partition is at most k.

Branchwidth is less used than treewidth in computer science, but it has the nice property that a variant of the definition yields the notion of rankwidth (see later). Further, it generalizes to hypergraphs (see later), and it generalizes to matroids (I won't give more details).

hlineny2007width (Section 3.1.1) gives a description1 of graphs of bw 2. Further, from robertson1991graph10, cited and proved in hlineny2007width, Theorem 3.1, we have, for any graph G with branchwidth >1: bw(G)tw(G)+11.5bw(G). Hence, "having bounded tw" and "having bounded bw" are equivalent.

Treedepth

This is an addition to the original post

Treedepth is defined as follows. An elimination forest F of a graph G=(V,E) is a forest (i.e., a collection of rooted trees) on V that ensures that for any {x,y}E, one of the nodes for x and y in F is an ancestor of the other in F (or, equivalently, there is a path from the root of F to a leaf that covers both x and y). The depth of an elimination forest F is the height of F in the standard sense (i.e., the maximal height of a tree of F). The treedepth of G is the minimal depth d such that G has an elimination forest F of depth d.

I will not describe the notion of treedepth in more detail, but, to connect it with the previous notions, we know, by Lemma 11 of bodlaender1995approximating, that, for any graph G, we have pw(G)td(G). However, path graphs have treedepth which is logarithmic in their size (with the elimination forest given as the tree of a binary search), but have constant pathwidth. So "having bounded td" is strictly more restrictive than "having bounded pw" which is strictly more restrictive than "having bounded tw".

Rankwidth and cliquewidth

Rankwidth (rw)

Rankwidth is defined just like branchwidth, except that we consider vertex partitions V=V1V2, whose weight is the rank of the adjacency matrix from V1 to V2 (taken as a submatrix of the full graph's adjacency matrix): and we consider subcubic trees where leaves are labeled by graph vertices rather than graph edges.

Rankwidth is less than branchwidth: rw(G)max(1,bw(G)) as shown in oum2007rank. Hence, "having bounded bw" implies "having bounded rw".

Graphs with rankwidth 1 are the distance-hereditary graphs, as shown in oum2005rank.

Cliquewidth (cw)

A clique decomposition of width k, or k-expression, in an expression over the following operations, i and j denoting any colors in {1,,k}:

  • create the graph with one isolated vertex labeled by i
  • change all vertices of color i to color j
  • connect all vertices of color i to those of color j (with ji, so no self-loops)
  • take the disjoint union of two graphs constructed by such an expression

A graph has cliquewidth k if some coloring of it can be obtained by a clique decomposition.

Graphs with cliquewidth 1 are those with no edges, and graphs with cliquewidth 2 are the cographs. The k by k grid has cliquewidth k+1 by (golumbic2000clique).

Cliquewidth can be connected to the modular decomposition of graphs. Further, there are variants such as symmetric cliquewidth, with "bounded cw" equivalent to "bounded scw", see hlineny2007width, section 4.1.3.

Cliquewidth is connected to rankwidth (oum2006approximating): rw(G)cw(G)21+rw(G)1. Hence, "having bounded cw" and "having bounded rw" are equivalent, though the rank-width bound can be exponentially larger.

Cliquewidth is connected to treewidth: we have cw(G)32tw(G)1 by hlineny2007width2, section 4.2.6. Hence, "having bounded tw" implies "having bounded cw", though the bound may again be exponential. Conversely, tw(G) can be bounded as a function of cw(G) and of the degree of G (kaminski2009recent, Proposition 2 a), so "having bounded cw" and "having bounded tw" are equivalent on graph families of bounded degree. The dependence on degree disappears for graph families that exclude any fixed graph as a minor (Proposition 2 b).

Preservation under graph operations

Subgraphs and induced subgraphs

A subgraph of a graph G is obtained by keeping a subset of the edges and vertices of G. Further, it is an induced subgraph if the edges kept are exactly the restriction of those of G to the kept vertices (i.e., no edge between kept vertices was removed).

A property is S-closed (resp. I-closed) if, when a graph has it, every subgraph (resp. induced subgraph) also does.

Minors and topological minors

A minor of a graph can be obtained from that graph by removing edges and vertices and contracting edges. Alternatively, it is witnessed by mapping the vertices v of the minor G to disjoint subsets s(v) of vertices of the graph G, and mapping each edge (u,v) of G to an edge between s(u) and s(v) in G.

A topological minor maps each vertex v of G to a single vertex s(v) of G, and maps edges (u,v) of G to vertex-disjoint paths from s(u) to s(v) in G: equivalently, a subdivision of G is isomorphic to a subgraph of G. Clearly if G is a topological minor of G then it is a minor of G.

We define M-closed and T-closed for minors and topological minors as we defined S-closed and I-closed.

In summary, from makowsky2003tree3:

  • M-closed implies T-closed, S-closed, I-closed;
  • T-closed implies S-closed, I-closed;
  • S-closed implies I-closed;
  • No other implication holds.

Monotonicity of our definitions

Standard graph acyclicity is M-closed. Hence, it is natural to wonder whether our other parameters also are.

  • "having treewidth k" is M-closed (as can be seen by applying deletions and merges to the tree decomposition).
  • "having pathwidth k" is M-closed (same argument).
  • "having branchwidth k" is M-closed (hlineny2007width, Prop 3.2).
  • "having cliquewidth k" is I-closed courcelle1990upper but obviously not S-closed (complete graphs have cliquewidth 2 but their subgraphs cover all graphs) nor T-closed.
  • "having rankwidth k" is I-closed but not S-closed for the same reasons.

Rankwidth is, however, preserved by the graph operation of vertex-minor (oum2005rank).

Other operations

Cliquewidth and rankwidth behave well under graph complementation: complementing can at most double the cw, and adds or removes at most 1 to the rw (quoted in hlineny2007width, section 4.2.1). By contrast, no such bounds are possible for treewidth and hence rankwidth (consider the graphs with no edges).

Treewidth, branchwidth, pathwidth, cliquewidth, rankwidth, can be computed on a graph with multiple connected components as the max of the same measure over each of the connected components.

Adding or deleting k vertices or k edges can only make rankwidth and cliquewidth vary by at most k (hlineny2007width). The same holds for treewidth (immediate as each vertex addition in a tree decomposition can only affect the width by at most 1).

For S-closed families, "having bounded cw" and "having bounded tw" are equivalent (makowsky2003tree, Proposition 7, attributed to courcelle1995logical which seems unavailable online).

Hypergraphs

Hypergraphs generalize graphs by allowing hyperedges that include an arbitrary number of vertices.

We will sometimes consider instances, whose hyperedges (called facts) are allowed to have a label (called the predicate or relation, which has a fixed arity) and where the occurrences of vertices in facts may be labeled as well: a fact is written A(a1,,an), where A is the predicate, n is the arity, and the ai need not be all different. Hence, an instance in logical terms is a structure for some relational signature. In this context it is natural to assume that the maximal arity (number of vertices per hyperedge) is bounded by a constant, though this assumption is not usually done when working on vanilla hypergraphs.

From hypergraphs to graphs

The following discussion of how to get from a hypergraph to a graph is from hlineny2007width. The second and third method can be applied to (non-hyper)graphs rather than hypergraphs, to get a different graph.

  • The primal graph or Gaifman graph P(H) of a hypergraph H is the graph on the vertices of H where two vertices are connected iff they co-occur in a hyperedge. In other words, hyperedges are encoded as cliques. The primal graph of a (non-hyper)graph is the graph itself.

  • The incidence graph I(H) of a hypergraph H is the bipartite graph on the vertices and edges of H where a vertex is connected to an edge if the vertex occurs in that edge. In other words, hyperedges are encoded as a vertex connected to all the vertices that it contains.

    The incidence graph of a (non-hyper)graph amounts to subdividing each edge once.

  • The line graph L(H) of a hypergraph H is the graph on the hyperedges of H where two hyperedges are connected iff they share a vertex. In other words, the vertices are encoded as cliques on the hyperedges. Alternatively, the line graph is the primal graph of the dual hypergraph.

    This construction also makes sense (i.e., is not the identity) for (non-hyper)graphs, where it is also called the line graph.

We will see in a later section how bounds on various width notions can be obtained through these transformations (for graphs and hypergraphs).

Hypergraph acyclicity

Before trying to come up with parameterized notions, there have been attempts to formalize crisp acyclicity notions for hypergraphs. See Wikipedia or fagin1983degrees for sources to the claims and more details. Each notion in the list is implied by the next notion (so they are given from the least restrictive to the most restrictive) and none of the reverse implications hold (fagin1983degrees, Theorem 6.1). Further, they all collapse to the usual notion of acyclicity (i.e., "being a forest") for hypergraphs that are actually (non-hyper)graphs (i.e., all hyperedges have size 2).

  • α-acyclicity: we can reduce the hypergraph to the empty hypergraph with four operations: remove an isolated vertex; remove a vertex in a single hyperedge; remove a hyperedge contained in another; remove empty hyperedges.

    Equivalently, a hypergraph is α-acyclic iff it has a join forest, which is a forest of join trees, which are a special form of tree decomposition: we require that, for each bag, there is a hyperedge containing exactly the elements of the bag. There are other equivalent characterizations of α-acyclicity which I will not define.

    We can test in linear time whether an input hypergraph is α-acyclic, and if so compute a join forest: see Theorem 5.6 of flum2002query. Counter-intuitively, α-acyclicity can be gained when adding hyperedges (think of a hyperedge covering all vertices), and hence also lost when removing vertices.

  • β-acyclicity: all sub-hypergraphs of the hypergraph (including itself) are α-acyclic; this is obviously more restrictive than α-acyclicity, in fact strictly so, and is testable in PTIME

  • γ-acyclicity: I won't define it here; see fagin1983degrees.

  • Berge-acyclicity: the incidence graph of the hypergraph is acyclic in the standard sense; this is obviously testable in linear time and is strictly more restrictive than γ-acyclicity.

We now turn to acyclicity definitions with a parameter k. Following standard usage, we will talk of the treewidth of a hypergraph H to mean the treewidth of its primal graph P(H). However, by hlineny2007width, Section 5.2, there are α-acyclic hypergraphs which have unbounded treewidth, i.e., their primal graphs have unbounded treewidth; and ditto for the notions of incidence graphs and of line graphs. This motivates the search for specific hypergraph definitions of treewidth (and of the other notions presented before).

Querywidth, hypertreewidth, generalized hypertreewidth

A decomposition of a hypergraph, for all three notions of width presented here, is a tree decomposition of its primal graph: however, the vertices of each bag are additionally covered by a set of hyperedges, and the width of the decomposition is redefined to mean the maximal number of hyperedges in a bag (where we do not subtract 1, unlike in the definition of treewidth). The differences are:

  • querywidth: the hyperedges of each bag must exactly cover the vertices of the bag.
  • hypertreewidth: the hyperedges of each bag must cover the vertices of the bag and they cannot include additional vertices if those occur in strict descendants of the bag.
  • generalized hypertreewidth: the hyperedges must cover the vertices, no further restriction is imposed.

Thus ghtw(H)tw(P(H)) and ghtw(H)htw(H)qw(H). In fact qw(H)1+tw(I(G)) (chekuri2000conjunctive, Lemma 2): a tree decomposition of the incidence graph is in particular a query decomposition when restricting the attention to the vertices of the incidence graph that code hyperedges, and the "+1" is because the definition of treewidth subtracts 1 to bag sizes.

Further, all three notions are within a factor 3 of each other (adler2007hypertree, gottlob2007generalized), so that "having bounded qw", "having bounded htw", and "having bounded ghtw" are all equivalent.

A (non-empty) hypergraph H is α-acyclic iff ghtw(H)=htw(H)=qw(H)=1 as mentioned in gottlob2014treewidth.

All three notions are bounded by n/2+1 on hypergraphs with n vertices (gottlob2005computing, Theorem 3.5 and subsequent remarks).

From the join tree definition, it is obvious that a hypergraph is α-acyclic iff it has querywidth equal to 1.

Last, it is obvious by definition that if the arity is bounded by a constant a, then we have tw(P(H))/aghtw(H). Hence, in this case, "having bounded tw" is equivalent to "having bounded htw" (and ditto for the other two hypergraph notions).

Fractional hypertreewidth

This is an addition to the original post

The notion of fractional hypertreewidth is defined in grohe2014constraint (not available online) and in marx2009approximating. It is again defined via tree decompositions of the primal graph of the hypergraph, but this time we annotate them with weighting sets of hyperedges: for each bag b of the tree decomposition, we have a weighting function wb for bag b that maps each hyperedge e of the hypergraph to a nonnegative real number wb(e). We require that, for every bag b, the function wb covers the set S of the vertices contained in b, namely, for each vS, the sum wb(e) over all hyperedges e of the hypergraph that contain v must be at least 1. The width of these augmented tree decompositions is the maximum, over all bags b, of the sum wb(e) over all hyperedges; and the fractional hypertree width of a hypergraph is the smallest possible width of such a decomposition.

Of course, the fractional hypertreewidth is less than the generalized hypertreewidth, as generalized hypertreewidth is obtained by restricting the decompositions to use weight functions having values in {0,1}.

Hyperbranchwidth

Hyperbranchwidth (adler2007hypertree) is defined like branchwidth, with partitions of hyperedges, except that the weight of a hyperedge partition is not its number of incident vertices, but the number of hyperedges required to cover its incident vertices (not necessarily exactly, i.e., we may cover a superset of them). We have (adler2007hypertree, Lemmas 16 and 17):

  • hbw(H)ghw(H) for any hypergraph H
  • ghw(H)2hbw(H) for any non-trivial hypergraph H (not all hyperedges are pairwise disjoint)

Hence "bounded hyperbranchwidth" and "bounded generalized hypertreewidth" are equivalent.

Apparently hyperbranchwidth is a problematic function because it is not submodular, unlike the functions used for branchwidth and rankwidth.

I do not know whether we can define variants of hyperbranchwidth with partition weight counted as the number of hyperedges required to cover exactly the vertices, and connect this to querywidth.

Cliquewidth

Cliquewidth is usually defined for hypergraphs as that of the incidence graph, in which case we have qw(H)cw(H), and there are hypergraphs with bounded querywidth but unbounded cliquewidth. Hence "having bounded querywidth" (and the other notions) is weaker than "having bounded cliquewidth". (gottlob2004hypergraphs)

Connections via transformations

In some cases, bounds on graphs or hypergraphs relate to bounds on the incidence, line, and primal graph.

The branchwidth of a (non-hyper)graph G is the rankwidth of its incidence graph, up to removing 1: bw(G)1rw(I(G))bw(G) (oum2007rank).

The treewidth of a (non-hyper)graph can be used to bound the cliquewidth of its line graph: .25(tw(G)+1)cw(L(G))2tw(G)+2 (gurski2007line, Corollary 11 1., and Theorem 5).

The treewidth of a (non-hyper)graph is exactly that of its incidence graph: tw(G)=tw(I(G)) (kreutzer2009algorithmic, Theorem 3.22). A similar result for cliquewidth is known assuming bounded vertex degree (kaminski2009recent, Proposition 8). Further, for any hypergraph H, we have tw(H)tw(I(H))+1 (gradel2002back), where we recall that the treewidth of a hypergraph was defined as that of its primal graph.

I am not aware of results connecting width measures on hypergraph to measures on the dual hypergraph.

Directed graphs

I file "directed graphs" in the same category as "hypergraphs", because it is about having a richer incidence relation. Width definitions for directed graphs are interesting because the usual acyclicity notion for directed graphs is different from that of undirected graphs (i.e., "being a forest", which we studied so far). This is an interesting area with lots of recent research that I will not attempt to summarize here. See hlineny2007width for a survey.

I will just note that clique decompositions can extend to directed graphs, where the operation that creates edges is meant to create directed edges (hlineny2007width, section 4.1.1).

Complexity of recognition

Definitions

Having studied all of these definitions, it is a natural question to ask, given an (hyper)graph, how can we compute its *-width. I will only deal with the parameterized width notions; for the crisp acyclicity definitions, I have already given the complexity of recognition when defining them in this section.

For the parameterized notions, we must be careful in the problem definition. There are two possible tasks:

  • Just determine the *-width of the input (hyper)graph
  • Determine the *-width and compute a *-decomposition of the input

Further, it is often useful to study a parameterized version of these decision problems. Hence, we consider two possible kinds of inputs; the first is of course harder than the second:

  • We are only given an input (hyper)graph and the algorithm must solve the task.
  • We have fixed kN; the algorithm is given a (hyper)graph with the promise that its *-width is at most k. In other words, if its width is >k the algorithm may do anything, otherwise it must solve the task.

In the second phrasing, example situations are:

  • the problem gets NP-hard whenever the parameter k is given a sufficiently large value;
  • for every k, the problem can be solved in PTIME, i.e., for some function f:NN, in O(nf(k)); this is the parameterized complexity class XP if f is computable (in our case it always will be);
  • the problem can be solved in complexity O(f(k)nc) for a fixed (computable) function f and a fixed constant cN independent of k, in which case we say that the problem is fixed-parameter tractable (FPT) (and we may talk of linear-FPT, quadratic-FPT, etc., to reflect the value c).

Robertson-Seymour

The well-known Robertson-Seymour theorem shows that any M-closed property can be characterized by a finite set of forbidden minors. As testing whether a fixed graph G is a minor of an input graph G can be performed in quadratic time (kawarabayashi2012disjoint)4; this implies the existence5 of a quadratic time test for any M-closed property. We will use this general result later.

Results

When k is fixed and a graph G is given as input:

  • For treewidth, branchwidth, pathwidth:
    • Determining the value of the treewidth, branchwidth, and pathwidth are quadratic FPT because "having treewidth k" and "having branchwidth k" are M-closed, so we can apply the Robertson-Seymour-based reasoning above.
    • For treewidth and pathwidth, there is a linear FPT algorithm to compute the width and a decomposition (bodlaender1996linear, the parameter being exponential in the width). The algorithm is infeasible in practice due to huge constants. For treewidth, it is preferable in practice to use an algorithm that computes a decomposition of a graph G of treewidth k in time O(|G|k+2) (flum2002query, end of Section 3), or various heuristics for approximation (see at the end of the section).
    • Branchwidth can be computed, along with an optimal branch decomposition, in FPT linear time (bodlaender1997constructive)
  • For rankwidth, cliquewidth:
    • Rankwidth, and an optimal decomposition, can be computed in cubic FPT time (hlineny2007finding).
    • It is PTIME to determine the cliquewidth for k3 (corneil2012polynomial, requires a subscription); otherwise this is still open.

For hypergraph acyclicity measures (α-acylicity, etc.), refer back to the paragraph where they are defined.

For hypergraph measures, when k is fixed and a hypergraph H is given as input:

  • It is NP-complete to decide whether an input hypergraph H has generalized hypertreewidth 3 (gottlob2007generalized). For 2, I do not know the complexity.
  • It is NP-complete to decide whether an input hypergraph H has querywidth 4 (gottlob2002hypertree).
  • For any k, we can determine in PTIME whether an input hypergraph H has hypertreewidth k, and if so compute a decomposition (gottlob2002hypertree); however this is unlikely to be in FPT (gottlob2005hypertree).
  • We can test in linear time whether it has generalized hypertreewidth 1, querywidth 1, or hypertreewidth 1, because this is equivalent to it being α-acyclic (mentioned in gottlob2014treewidth).

When a graph G is given as input, with no bound on the parameter k:

  • Treewidth is NP-hard to compute on general graphs; it is computable in PTIME on chordal graphs (bodlaender2007combinatorial, after Lemma 11); on planar graphs, this is open.
  • Branchwidth is NP-hard on general graphs, but on planar graphs it can be determined and a decomposition constructed in PTIME (seymour1994call; cubic bound shown in gu2005optimal, unavailable without subscriptions).
  • Cliquewidth is NP-hard to compute (fellows2009clique).
  • Fractional hypertreewidth can be approximated in PTIME: given a hypergraph H, we can compute in PTIME a fractional hypertree decomposition of width O(w3) of H, where w is the real fractional hypertreewidth: see marx2009approximating.

There are practical implementations to compute tree decompositions of graphs, trying to compute the optimal treewidth or an approximation thereof. A fellow student of mine successfully used libtw. See Section 4.2 of bodlaender2007combinatorial for more details about how treewidth can be approximated (including, e.g., the use of metaheuristics). See also bodlaender2009treewidth.

Logical problems

Logics

Propositional logic (Boolean connectives) is the logic allowing variables and Boolean connectives: negation, conjunction, disjunction. Propositional logic formulae are often given in conjunctive normal form (CNF), i.e., a conjunction of clauses which are disjunctions of literals (variables or negated variables); or in disjunctive normal form (a disjunction of conjunctions of literals).

(Function-free) first-order logic (FO) is the logic obtained from propositional logic by allowing existential and universal quantifiers.

Monadic second-order logic (MSO) is the logic obtained from FO by further allowing existential and universal quantification over sets (unary relations).

Monadic second-order logic with edge quantification (MSO2) is the logic on graphs that further allows quantification over sets of edges. Its generalization to hypergraphs (quantifying over relations of arbitrary arity) is called guarded second-order logic (GSO). In GSO, quantifications over relations of arity >1 are semantically restricted to always range over guarded tuples, i.e., tuples of values that already co-occur in a hyperedge. Likewise, quantification over sets of edges in MSO2 means quantifications over edges in the actual structure, not quantification over arbitrary pairs. However, this should not be confused with guarded logics, where first-order quantification is restricted as well: MSO2 and GSO without second-order quantification is exactly FO, it is not restricted to the guarded fragment of FO.

It turns out that MSO2 over graphs is equivalent to MSO over the incidence graph of these graphs, where the incidence graph is assumed to be labeled in a way that allows the logic to distinguish between vertices and edges. More generally, GSO over hypergraphs (or labeled hypergraphs, i.e., relational signatures) is equivalent to MSO on the incidence instance (gradel2002back, Proposition 7.1).

An example of a proposition that can be expressed in MSO2 but not MSO is the existence of a Hamiltonian cycle in a graph (see kreutzer2009algorithmic, Section 3.2).

Problems

The satisfiability problem for a logic L and class of (hyper)graphs C asks, given a formula ϕL, whether it has a model in C.

The satisfiability problem for MSO2 sentences, MSO sentences, and actually FO sentences is undecidable over general graphs (even over finite graphs, in fact; see Trakhtenbrot's theorem which is the corresponding result for validity, i.e., the negation of satisfiability of the negated formula). For propositional logic, the problem SAT of deciding whether a propositional formula is satisfiable is NP-complete.

The model-checking problem for a logic L and class of (hyper)graphs C, asks for a fixed formula ϕL with no free variables, given an input (hyper)graph GC, whether G satisfies ϕ in the logical sense. I will not study the version of model checking where both the formula and (hyper)graph are given as input, which is of course considerably harder.

The model-checking problem for MSO in general is hard for every level of the polynomial hierarchy (ajtai2000closure). A simple example that shows NP-hardness on graphs is asking whether an input graph can be colored with 3 colors, which is MSO-expressible.

A variant of the model-checking problem is the counting problem that asks, for a formula with free variables, given a (hyper)graph, how many assignments of the free variables satisfy the formula; and the enumeration problem, that requests that the satisfying assignments be computed (with complexity measured as a function of the size of the output, or measured as the maximum delay between two enumerated outputs).

Monadic second order

For satisfiability, it is known that, for any k, satisfiability for MSO2 over the class of graphs of treewidth k is decidable (seese1991structure); this implies the same for branchwidth (an equivalent claim) and pathwidth (a weaker claim). Conversely, it is known that for any class of graphs of unbounded treewidth, satisfiability for MSO2 is undecidable: this relies on a different result by Robertson and Seymour, the grid minor theorem, to extract large grid minors from unbounded treewidth graphs, and using the fact that the MSO theory of grids is undecidable. For details see kreutzer2009algorithmic, Section 6.

Further, it is known that, for any k, satisfiability for MSO over the class of graph of cliquewidth k is decidable. The converse result was conjectured in seese1991structure and a weakening was proven in courcelle2007vertex: it shows the converse result but for a logic further allowing to test that a set variable of MSO is interpreted by a set of even cardinality. Also, it is already known that a class of graphs has bounded cliquewidth iff it is interpretable in the class of colored trees (cited as Lemma 4.22 in kreutzer2009algorithmic).

For model checking, it is known that the problem for MSO2 on bounded treewidth structures is FPT linear (with the formula and the width bound as parameter), see kreutzer2009algorithmic, Section 3.4; this is what is known as Courcelle's theorem. The model checking problem for MSO on bounded cliquewidth structures is FPT cubic, the bottleneck being the computation of the rank decomposition: see kreutzer2009algorithmic, Section 4.3; this is also due to Courcelle. The FPT tractability (not linearity) of MSO2 on bounded treewidth reduces to the same result for MSO and bounded cliquewidth by going through the incidence graph (see these slides, slide 8).

The tractability of MSO on bounded cliquewidth does not extend to MSO2. Indeed, some problems definable in MSO2 but not in MSO are known to be W[1]-hard when parameterized by clique-width, e.g., Hamiltonian cycle, see fomin2009clique and fomin2010algorithmic. Further, it is known that there is a MSO2 formula such that model checking on the class of all unlabeled cliques (which has bounded cliquewidth) is not in PTIME, assuming P1NP1 (those are unary languages): see Theorem 6 of courcelle2000linear.

For model-checking, the constant factor that depends on the fixed formula ϕ is nonelementary even for a formula in FO (frick2004complexity).

The tractability results for model checking on bounded treewidth structures are usually shown using interpretability results to translate the logic on the instances to a logic on the (suitably annotated) tree decompositions (see kreutzer2009algorithmic), and using a connection from MSO on trees to tree automata that originated in thatcher1968generalized (subscription required). See flum2002query for a summary. A connection to automata for bounded rankwidth structures was more recently investigated in ganian2010parse.

The necessity of bounded treewidth for the tractability of model-checking under closure assumptions (i.e., S-closed, M-closed, etc.) was studied in makowsky2003tree. The S-closed case was refined in kreutzer2011lower (for MSO2), and in ganian2012lower (for MSO but additionally assuming closure under vertex labellings).

The problem of counting the number of assignments to a MSO formula is also known to be linear on bounded treewidth (hyper)graphs (arnborg1991easy), and computing the assignments (the enumeration problem) can be done in time linear in the input and in the output (flum2002query). For results on constant-delay enumeration, see, e.g., segoufin2012enumerating.

First-order

I am not aware of conditions that try to ensure decidability of FO satisfiability (in the spirit of those for MSO and MSO2 above), so I focus on model checking.

The model-checking problem for FO is in PTIME, in fact in AC0, but this does not imply that it is FPT (where the query is the parameter): the degree of the polynomial in the (hyper)graph depends in general on the formula.

It is more complicated for FO than for MSO to find criteria that guarantee that FO model checking is FPT, because of locality properties of FO. In particular, there are notions of locally tree-decomposable structures (frick2001deciding), which ensure FPT linearity of FO model checking in more general situations.

For more about FO, see Section 7 of kreutzer2009algorithmic, noting that Open Problem 7.23 has now been solved in the affirmative (dvorak2013testing).

The problem of enumeration for first-order is also studied in segoufin2012enumerating.

Propositional logic

The model-checking problem for propositional formulae is obviously always tractable. However, it is more interesting, given a propositional formula in normal form, encoded as an instance, to determine whether it is satisfiable (SAT), and count the number of satisfying assignments (#SAT).

The standard way to represent a conjunctive normal form formula is as a bipartite graph between clauses and variables with colored edges indicating whether a variable occurs positively or negatively in a clause (see ganian2010better, Definition 2.7). It is then known that SAT and #SAT are FPT if this graph has bounded treewidth or if it has bounded rankwidth (see ganian2010better and references therein).

Tractability for #SAT is known for other cases, e.g., when the standard representation of the formula as a hypergraph is β-acyclic (brault2014understanding).

Query evaluation and constraint satisfaction problems

Problems

This section studies the constraint satisfaction problem, formalized as the problem CSP of deciding, given two relational instances I1 and I2, whether there is a homomorphism from I1 to I2, i.e., an application mapping the elements of I1 to those of I2 such that each fact (hyperedge) of I1 exists in I2 when mapped by the homomorphism; as well as a counting variant that asks how many such homomorphisms exist.

The CSP problem is also studied in database theory, seen as Boolean conjunctive query (CQ) evaluation: a Boolean CQ is an FO sentence consisting of an existentially quantified conjunction of atoms. It is clear that a CQ Q is satisfied in an instance I iff there is a homomorphism from IQ to I where IQ is the canonical instance associated to Q.

One problem phrasing is when both I1 and I2 are given as input, which I will call the combined complexity approach. In this case, it is sensible to restrict I1 and I2 to live in fixed (generally infinite) classes C1 and C2, and study how the choice of class influences the complexity.

Another problem phrasing, common in database theory, is when one of C1 and C2 is a singleton class, so we are in one of two settings:

  • data complexity: I1 is a fixed structure and I2 is the input
  • query complexity: I2 is a fixed structure and I1 is the input

The question then becomes how the complexity of the CSP problem evolves depending on the fixed structure, and on the class in which the input structures may be taken; so this is really the combined complexity approach but with one of the classes being a singleton (so the corresponding structure is effectively fixed rather than given as input).

Results

Without any restriction, the CSP problem is clearly in NP in combined complexity (guess the homomorphism and check it) and its counting variant is in #P (count nondeterministically over all mappings that are checked to be homomorphisms). The CSP problem is easily seen to be NP-hard in query complexity for very simple I2, even when C1 is the class of graphs: fixing I2 to be a triangle, an input I1 has a homomorphism to I2 iff it is 3-colorable. For any fixed I1, the data complexity of the CSP problem when C2 is all structures is in AC0 from FO model checking, but it is W[1]-complete (papadimitriou1999complexity, Theorem 1) and hence unlikely to be in FPT.

For arbitrary I1, data complexity becomes FPT when the input instances I2 are restricted to have bounded treewidth or bounded cliquewidth, from the results on FO.

Combined complexity becomes LOGCFL-complete (and hence in PTIME) when C2 is a class of bounded querywidth structures, or structures of bounded degree of cyclicity (yet another definition), and they are given with a suitable decomposition (gottlob2001complexity). This generalizes an earlier analogous result for α-acyclic queries (yannakakis1981algorithms, seems unavailable online). For bounded treewidth or bounded hypertreewidth, the same result holds but the decomposition can even be computed in LOGCFL, so it doesn't need to be given. The PTIME membership still holds when C2 is a class of bounded fractional hypertreewidth structures (grohe2014constraint, unavailable online).

For hypergraphs of fixed arity, assuming that FPTW[1], it is known that the combined complexity where C2 is all structures is in PTIME iff C1 has bounded treewidth modulo homomorphic equivalence. This result generalizes to the problem of counting the number of homomorphisms (dalmau2004counting). In cases where C1 satisfies this condition, the CSP problem is FPT with the parameter being the size of I1. A trichotomy result for the complexity in such cases is given in chen2013fine, including for counting.

When restricting C1 and C2 to graphs (not hypergraphs), an analogous result is known for combined complexity when C1 is all structures: tractability holds iff C2 is bipartite (grohe2007complexity), otherwise NP-completeness holds. The Feder-Vardi conjecture claims that there is an analogous dichotomy even when C1 and C2 can be hypergraphs rather than graphs (i.e., the query complexity of CSP is never, e.g., in NPI). This fundamental problem is still open.

Partial order theory

As bonus material, because I love partial order (poset) theory: an unconvential way to look for tractability parameters on graphs is to look at tractability parameters on posets and look at the (directed) graphs defined by the transitive reduction of these posets, namely, their Hasse diagrams; and look at the cover graph which is the undirected version of the Hasse diagram.

There has been recent work connecting the tractability measure of order dimension on posets to graph measures on the cover graph. Here are results, for any poset P with cover graph G:

Last, a note about something that confused me: imposing that a poset is series-parallel is not the same as imposing that its cover graph is a series-parallel graph. The N-shaped poset, which is the standard example of a non-series-parallel poset, but its cover graph is just a path graph. Hence series-parallel posets intrinsically rely on directionality.

List of DOIs

To ensure that the references in this article can still be followed even if some preprints are removed from the Web, here is the DOI of the citations, in the order in which they appear in the text.


  1. However, in point (iii) in the list of hlineny2007width, the definition of "series-parallel graph" can be misleading. I would rephrase point (iii) as: G has bw 2 iff G does not contain K4 as a minor (bodlaender1998partial, Theorem 10, (iii)) iff G has tw 2 (bodlaender1998partial, remark after Theorem 10) iff every biconnected component of G is a series-parallel graph (bodlaender1998partial, Theorem 42). 

  2. The original paper is corneil2005relationship: D. G. Corneil, U. Rotics, On the Relationship Between Clique-Width and Treewidth, 2005, but it is not available without subscriptions. 

  3. Observation 1 of makowsky2003tree does not include the fact that T-closed implies S-closed (hence I-closed), but I believe this to be an oversight. 

  4. If G is not fixed, it is NP-hard, given two graphs G and G, to decide whether G is a minor of G; Wikipedia), 

  5. While this indeed shows the existence of a quadratic test, to be able to construct this algorithm, we need to know the actual finite set of forbidden minors. However, the Robertson-Seymour proof is not constructive, and undecidability results are known about computing these families in general; see kreutzer2009algorithmic, section 5.4.).