[My choral colleague, Steve Roth, contributed the guest post below on how he scans and distributes sheet music to the iPad/tablet users in our chorus. He includes detailed step-by-step instructions – note that you’ll need to have some proficiency with command-line tools in order to use his process.
Listen to Steve – he’s da man! I scanned and prepped my 264-page vocal score of Handel’s Messiah using some basic compression methods and it ended up at 57 MB (will write that up later). Steve scanned and prepped a 175-page vocal score of Bach’s St. Matthew Passion – and it came in at under 7 MB.
Steve, many, many thanks to you for sharing your knowledge with us!]
Tech4Singers and I sing in the same chorus, and there’s a steadily growing proportion of iPad sheet music users in the chorus (currently about 20%). We all use forScore, which is a superb sheet music reader for the iPad, and as you’ll see below, there’s considerable advantage in having everyone use the same software. I’m the guy who’s (unofficially) responsible for providing the electronic music for all of those singers. Often that involves scanning paper music. Following up on Tech4Singers’ recent request, I thought I’d share here the full details of how I do the scanning. Please note that I make no claims that this is the best way; it’s just one way that works, using tools I’m familiar with.
Before diving into the details, I have to offer a note on legalities. Most paper music is under someone’s copyright, and scanning the music for use on an iPad is in a legal gray zone. Personally, I content myself with following the spirit of the copyright law: I only scan works for which I actually have a paper copy. If the paper copy is borrowed (e.g. from our chorus library), then I only keep the scan for as long as I have the paper copy checked out. And while I don’t attempt to enforce it, I remind my fellow choristers using my scans that they should do the same. I am not claiming that this is compliant with copyright law; I’m not a lawyer and I haven’t found one that will give me a straight answer. But I do assert that it gives due respect to the copyright holders, and that’s good enough for me.
Most sheet music is in one of three formats:
- 8½ by 11 inches, single sheets
- 11 by 17 inches, folded (each side of each sheet has two 8½ by 11 pages of music)
- 10ish by 14ish inches, folded (each side has two 7ish by 10ish pages)
The single sheet music can be handled by pretty much any scanner with a sheet feeder. For the other two formats, though, you need a scanner that can handle large format paper. In this post I’ll discuss handling a 10 by 14 inch folded score, because that’s the most complex case as well as the most common.
Begin by removing any staples or bindings so that you can feed the sheets through the scanner’s sheet feeder. You should also briefly counter-fold them (fold them the opposite way from the way they came) so that they are relatively flat when they go through the sheet feeder; that helps prevent jams. (By the way, yes, a sheet feeder really is a requirement, and so is removing the binding. If you try to put pages on the scanner glass yourself, I guarantee you your pages will not be exactly straight. It’s even worse if you put a bound score on the scanner glass.) [Editor’s note: there are tools that will help you deskew crooked scans – but it’s preferable not to have them be crooked in the first place.]
My scanner settings are 200dpi, black and white. Color (or even gray scale) is not needed for scores, and it makes them vastly bigger (and therefore slower). I have found 200dpi to be a good compromise between size and sharpness, but you could consider using 300dpi for scores with tiny print. It is critical not to let the scanner use any form of lossy compression (like JPEG). Scans should either be uncompressed or should use some lossless compression (PNG, GIF, TIF).
For the rest of the process, you can use any tool that allows you to make batch changes to large numbers of image files at once. My preference, used in the examples below, is the ImageMagick suite.
Step 1 (if necessary, depending on your scanner’s output format): Separate each side of each sheet into a separate image file. (Assuming your source was folded sheets, each image file will have two pages.)
$ convert +adjoin multisheet-file singlesheet-filename-prefix
Step 2: Crop the images to the appropriate width. Measure the width of the double-page sheet and multiple by the resolution of the scan. For our 10×14 example at 200dpi, that would be 2000 pixels. Crop the height of the images to that amount, keeping the width unchanged.
$ identify file $ mogrify -crop widthxheight+0+0 files...
where width is the width of the sheet reported by the identify command and height is the desired height. Note this keeps the top part of the image and discards the bottom; that should be correct for most sheet-fed scanners.
Step 3: Examine each image and rotate them all to the proper orientation. Typically the even-numbered sheets go one way and the odd-numbered sheets go the other. Clockwise rotation is
$ mogrify -rotate 90 -page +0+0 files...
Counter-clockwise uses negative 90.
Step 4: Measure the height of the sheet, less any margin you want to discard, multiply by the scan resolution, and crop that much out of the middle of the images:
$ mogrify -crop widthxheight+0+offset -page +0+0 files...
where width is the height calculated in step 2 (we’ve rotated it since then); height is the desired height, and offset is the offset needed for that height to be taken from the center of the image, generally half of the difference between current and desired heights (but possibly different if the part you want to keep is off-center).
Step 5: Split the double-page images into single-page images:
$ convert +adjoin -crop widthxheight files... output-file-prefix
where width is half the current image width and height is the (unchanged) current image height. The resulting files will be named with your specified prefix and a numeric suffix. We also need to tell ImageMagick to reset the page bounding box to the image bounding box:
$ mogrify -page +0+0 output-file-prefix*
Step 6: Manually examine the pages and renumber them to correct order. Sorry, it’s tedious, but I haven’t found a good way to avoid it.
Step 7: Create the PDF, using CCITT Group 4 compression (which does a much better job on monochrome, line-art-like images than any other I’ve seen).
$ convert files... output-file.pdf
Step 8: If you’re preparing for use in forScore, add metadata to the PDF so that forScore knows the name and composer of the piece. Use any PDF editing tool for this purpose, setting the “Title” and “Author” metadata respectively. Personally, I use pdftk:
$ pdftk input.pdf dump_data output info-file
[Editor’s note: PDF Info for Windows is another free and slightly more user-friendly tool for doing this.] Edit the info-file to add key/value pairs for Title and Author.
$ pdftk input.pdf update_info info-file output output.pdf
In our chorus’s case, we’re usually distributing an entire concert’s worth of music at one time, so I take some additional steps to make this easy for our users. I put all of the individual PDFs on a web server and load them into my forScore using its browser. I go through each of them and add any annotations that everyone will want (such as advance markings from the director, or Links for handling repeats). [Editor’s note: Adding links to the PDF score prior to distribution is an extremely helpful thing for a choral librarian to do. During a read-through, it saves singers from having to scramble to find the first page of a repeat section, 1st/2nd endings, etc.] In some cases, for multi-movement works, I’ll make chapters within the PDF file so that it’s easy to jump to a movement. Then I make a forScore “setlist” containing all of the pieces (in the proper order if that’s known), and use forScore’s Dropbox feature to upload the setlist to my Dropbox, turning on the switch that tells it to include the scores as part of the setlist file. The resulting “.4ss” file is a single file our users can install in their forScore and get the entire concert’s scanned music, with annotations. (I occasionally see odd errors with this import, but they go away if forScore is killed and restarted.)
I’m sure there are lots of ways that all of this could be streamlined, and maybe better tools to use for all or part of it. Nevertheless, this is what I’ve been doing, and I hope it will at least give you some ideas.