songflower

reflow bitmap sheet music to a different paper format
git clone https://a3nm.net/git/songflower/
Log | Files | Refs | README | LICENSE

README (9088B)


      1 Songflow is a collection of scripts to reflow sheet music (e.g., from a PDF) to
      2 a new page size, e.g., to fit it on a mobile phone or tablet. It does not
      3 require additional information about the music, and works with bitmap
      4 renderings.
      5 
      6 In particular, I have used Songflow to reformat the public-domain sight-reading
      7 course "Melodia"
      8 <https://ia800203.us.archive.org/17/items/cu31924021781434/cu31924021781434.pdf>
      9 to a more convenient format.
     10 
     11 This is the invocation used:
     12 
     13   ./master.sh cu31924021781434.pdf 750 555 out.pdf 20 250 ./fix_melodia.sh
     14 
     15 If you change the parameters, you should modify "fix_melodia.sh", at least to
     16 avoid removing the hardcoded file paths (or you can replace "fix_melodia.sh"
     17 above by "echo" to skip the fixes).
     18 
     19 == 1. What it does ==
     20 
     21 Songflow does the following:
     22 
     23 - Splitting a PDF file into multiple pages
     24 
     25 - Splitting pages into systems, separated by sufficient consecutive lines of
     26   white or near-white space. For this to work, your file must have sufficient
     27   contrast, and must not be skewed (the separation between systems should be
     28   horizontal)
     29 
     30   The systems are also trimmed of near-white content at the left and right
     31 
     32 - Splitting systems into blocks of measures of the right width, and resizing
     33   them to the desired width. This is the most fragile step.
     34   
     35   Empty spaces in the system are detected as having near-minimal height of
     36   non-white content, and near-minimal total weight. Bars are identified as two
     37   consecutive empty spaces that are sufficiently close and such that the space
     38   between them has a significiantly higher density of non-white content. This
     39   step requires the bars to be sufficiently vertical and the scan to be
     40   sufficiently crisp. It works more reliably on systems where measure bars take
     41   the whole system. For this step to work, you will probably need to adjust
     42   threshold and distances. The program may fail by refusing to split (more
     43   accurately doing aggressive splits at random points), or may misdetect some
     44   patterns as bars (especially the stems of half-notes). The program will also
     45   cut at bars that break a slur.
     46 
     47   Once bars are detected, the program splits them in blocks of the right width
     48   (by a bruteforce algorithm that finds a solution minimizing the length of the
     49   shortest segment), and each block is stretched to the required width by
     50   stretching empty space only to avoid distorting the picture. This step may
     51   fail by stretching things (e.g., notes) that should not be stretched, or by
     52   failing to detect some empty space and stretching too much the places that it
     53   detects (especially when constrained because not all bars were correctly
     54   detected).
     55 
     56 - Combining the blocks back into pages of the right size (greedily fitting them
     57   and arranging them on the page)
     58 
     59 == 2. What it requires ==
     60 
     61 You need imagemagick, Python 3, pdftk, GNU parallel (not to be confused with
     62 parallel from moreutils, which won't work) and some Python libraries (numpy,
     63 imageio).
     64 
     65 == 3. How to use it ==
     66 
     67 A script, master.sh, is provided to automate all of the conversion. Basic usage
     68 would be:
     69 
     70   ./master.sh INFILE.pdf WIDTH HEIGHT OUTFILE.pdf
     71 
     72 The process can take several hours for large PDF files (e.g., for Melodia).
     73 
     74 However, it is likely that you will need to peer into the internals, so read on.
     75 
     76 == 4. The scripts ==
     77 
     78 The interesting scripts are:
     79 
     80 === 4.1. Splitting pages into systems ===
     81 
     82 splith.py splits pages into systems. The way to use it is:
     83 
     84   ./splith.py file.png out/
     85 
     86 It will write files out/file_0001.png, out/file_0002.png, etc., one for each
     87 system, covering disjoint regions of the page.
     88 
     89 - The parameter --whitethreshold indicates the sensitivity to consider things
     90   as white space (this is a grayscale value, i.e., between 0 and 255). Setting
     91   it to a higher value will make the program cut more agressively.
     92 
     93 - The parameter --maxheight can be used to ensure that the extracted "systems"
     94   will not be higher than the specified height (in pixels). This can be useful
     95   in case you want to be sure the extraction won't fail later (e.g., combine.py
     96   will ignore stuff which is too high to fit on the requested page height).
     97 
     98 - The parameter --mincontentheight (in pixels) indicates the height of content
     99   that is "too small to matter". If some small junk on the page gets extracted
    100   as a system, or decorations or lyrics are attached to the wrong system, try
    101   increasing this parameter.
    102 
    103 - The parameter --minheight indicates the minimal height of an extracted system
    104   (in pixels). If small junk gets extracted as a system, you can increase this
    105 
    106 - The parameter --distthreshold (in pixels) can be lowered to stop cutting when
    107   the empty space between two "systems" is too low compared to the largest empty
    108   space. You can lower this parameter if the program cuts too agressively, but
    109   it may then fail to cut, e.g., on pages with large space because of a title
    110 
    111 === 4.2. Splitting lines into chunks ==
    112 
    113 splitw.py splits lines into chunks. The way to use it is:
    114 
    115   ./splitw.py file.png out/ WIDTH
    116 
    117 It will write files out/file_0001.png, out/file_0002.png, etc., one for each
    118 system, having exactly the requested width WIDTH.
    119 
    120 Parameters to detect the height:
    121 
    122 - The parameter --whitethreshold (grayscale value between 0 and 255) indicates
    123   what counts as "white" when measuring the height of a line
    124 
    125 - The parameter --minlength controls the minimum consecutive number of pixels at
    126   which something can be "low-height" (i.e., the minimal bar height)
    127 
    128 - The parameter --margin indicates the number of pixels to be excluded at the
    129   left and right of the screen when finding the minimal height
    130 
    131 - The parameter --outlierquantile in a percentage indicating the proportion of
    132   minimal height values to discard (consider as outliers)
    133 
    134 - The parameter --heightthreshold indicates the tolerance (in pixels) up to
    135   which something is considered "low-height"
    136 
    137 Parameters to detect the weight:
    138 
    139 - The parameter --outlierquantile mentioned above is also used to eliminate
    140   outlier weights
    141 
    142 - The parameter --weightthreshold (grayscale value between 0 and 255) indicates
    143   the weight threshold up to which something is still considered as minimal
    144   weight.
    145 
    146 - The parameter --weightwindow (in pixels) indicates the width of the window
    147   over which weight is computed (for smoothing)
    148 
    149 Parameters to detect bars:
    150 
    151 - The parameter --maxbardistance (in pixels) indicates the maximal distance
    152   between two low-height, low-weight parts of the staff on each end of a bar
    153 
    154 - The parameter --minbarweight (float) indicates how much more weight a bar
    155   should have relative to the minimal weight
    156 
    157 - The parameter --minchunk (in pixels) indicates, where we cannot find a bar
    158   where to cut, what is the smallest admissible width for doing a cut at an
    159   "empty point"
    160 
    161 Parameters to debug:
    162 
    163 - With --debug, the program will write out/debug.png with the input file colored
    164   to indicate low-height and low-weight parts as well as detected and added bars
    165   and the bars chosen to cut
    166 
    167 == 4.3 Combining chunk into pages ==
    168 
    169 combine.py combines chunks into a page. The way to use it is:
    170 
    171   ./combine.py infolder/ outfolder/ HEIGHT
    172 
    173 All files in infolder/ should have the same width, and they will be considered
    174 in alphabetical order. They will be grouped in files in outfolder/out_0001.png,
    175 etc., having the prescribed HEIGHT and the common width. Input files whose
    176 height is too large to fit will be ignored with a warning.
    177 
    178 - The parameter --hmargin indicates the space in pixels left at the left and at
    179   the right of the produced images
    180 
    181 - The parameter --vmargin indicates the space in pixels left at the top and at
    182   the bottom of the produced images
    183 
    184 - The parameter --separator indicates the minimal vertical space in pixels
    185   between two chunks
    186 
    187 == 5. Limitations ==
    188 
    189 - Songflow will rasterize vector PDFs. On raster PDFs, you need to specify by
    190   hand a new rasterization density which will re-scale the content during the
    191   export. It would be conceivable to change splith.py to extract regions
    192   vectorially, but splitv.py which stretches the partition in a complicated way
    193   it would be trickier.
    194 
    195 - When splitting pages into systems, sometimes lyrics are lost or not put at the
    196   right place. The handling of rylics could be improved in splith.py to attach
    197   them to the staff above instead of dropping them, and splitw.py could detect
    198   the lyrics layer (looking for a cut), then ignore it when searching for cut
    199   and stretch points, and the lyrics could then be moved to the right place...
    200 
    201 - The word-wrapping algorithm is an exponential-time bruteforce algorithm,
    202   whether it could be implemented using dynamic programming. That said, given
    203   the size of instances, I doubt it's a bottleneck.
    204 
    205 - Splitting lines is pretty error-prone, with some inadequate cuts (not at bars)
    206   and some weird stretching. The accuracy could be improved by better improved,
    207   or by genuine learning techniques instead of crude heuristiques.
    208 
    209 - If you call Songflow on something that is not music, it will not notice and
    210   will happily botch the content instead of leaving it alone.