a3nm's blog

Managing installed packages in Debian

This post is about how I manage the apt packages that are installed on my Debian systems.

Left to myself, I tend to apt-get install various things every now and then: to try them out, or because I temporarily need them. In most cases it turns out I never really use them, but of course I will never remember to clean them up. Eventually my / partition fills up and I have to waste time tracking down which packages are useless. (Of course, accumulating useless packages is also a bad idea in terms of security, performance, etc.)

If I want to back up my current selection of packages, I could dump the output of apt-mark showmanual somewhere, but that's not really satisfactory; the list of packages that I use should be stored as a first-class citizen in my config, not just obtained from the current system state.

If I need to set up a new machine, I can install all the packages that were installed on the previous one, but this will end up downloading gigs of packages including lots that I don't really care about. Of course, as soon as I start using several different machines, it is necessary to install on all machines the new packages that I need, so the installed packages must be kept in sync somehow. Otherwise, I have to waste time watching apt-get installing stuff every time I run a command on a host and realize that a package is missing. (However, remember that most installed packages are not really important and probably don't need to be synchronized at all.) All of this is made worse by the fact that not all machines are equal (I don't want to install graphical stuff on servers, laptop tools should only go on laptops, etc.).

My current answer to this is to maintain in my public configuration repository a bunch of program lists for various kinds of uses. This list is synchronized between machines using git, as is the rest of my configuration. It contains only the packages that I really have some use for, rather than all the random crap I have ever installed and don't remember the point of1. (This being said, I don't care about it being minimalistic, and I'm OK with including large tools that I use rarely, as long as there are reasonable odds I will use them again.)

I have a private file that lists, for my various hosts, which program selections it needs from that list, for instance:

my-laptop-1 minimal laptop server util
my-server minimal server

New packages that I install don't go to this list by default, but my crontab on each host mails me every month the diff between the currently manually selected packages on that host and the ones which should be installed according to the host's list. Here is the script: check-packages.sh.

From this monthly report, I can then update the list by removing the new installed packages that I don't need after all, putting the ones that I do need in the right selection, and installing the ones which should be installed. (And, of course, postponing the ones I haven't yet made up my mind about.)

The system is of course fairly rudimentary: there is no dependency between package selections, the dependencies of every piece of software that I compile myself have to be listed in a separate file, the sync process is manual, etc. Yet I am happy that I have, at last, one central list of the packages that I consider useful to have on my systems; as a bonus I can even share it with the world.

  1. Of course, the hard part was to clean up the currently installed packages so as to come up with this list in the first place. 

comments welcome at a3nm<REMOVETHIS>@a3nm.net