Convert books into PDF taking photos of pages and processing them into a searchable digital backup copy of that you can take anywhere.
Making a PDF from a printed book is simple: you just need to take a photo of every page and crunch all photos into some programs. But it is faster and easier than you might think! Here’s how.
My improvised photographing rig was just some cardboard boxes to prop up the iPad and a light source to illuminate the book, and to avoid dark shadows on the pages. Other times I just put the book on a chair, the iPad on a desk and the light source next to the iPad.
The ideal setup will give nice full screen photos of the open book, with pages spread straight across. Obviously you should be able to clearly read the text in all photos. Dark photos or curvy pages won’t work well. A good criterion is that you can easily read text on the photos you take.
Once you have a good setup you can start taking photos. It’s just that: turn page, take photo, turn page, take photo, until you have gone through all the book. It takes approximately 10 minutes for a 250 page book, or even less.
Once you have all the photos on your computer the content has to be isolated. To do this you can use the excellent open source and free scan tailor (http://scantailor.sourceforge.net/) available for mac, windows and linux.
The program is a full toolkit for going from plenty of scans or photos to content images. It is divided into six different areas which guide you through all the processing phases. Before entering the processing steps you have to select the images from your drive and specify their dpi. I usually go with 300×300 dpi for iPad photos, and apply that to all the images in the folder.
Rotate pages 90 degrees clockwise or counterclockwise. If you took all the photos with the same orientation you can apply the same transformation in batch.
This is the first ‘magic’ mode of the program: it detects the crease line between book pages and generates two distinct images from a photo of an open book. Simply hit the batch button of this phase and let the program go through all the photos. Then review the result on the list on the right. Most of the photos will be split correctly, but you can in any case retouch the few ones manually.
Deskewing allows you to fine tune the rotation of the page halves detected at the previous step. This phase can be as well be performed automatically with the batch button, however I found out that it is not so precise as the previous one (80% correct) if the photos were taken with the book not fully open. It takes some practice to understand what the program is able to fix automatically and what not.
Selecting contenct must be done in order to let the program know which part of the image is relevant for the final output. The rest will be ignored. This mode can also be done in batch and manually retouched later if there are some imprecise recognitions. It turns out that foreign objects in the photos, for example your finger or a hand, can in some cases affect the accuracy of the automated select content phase. In this case a quick selection in manual mode fixes the problem
This phase simply adds some margins around the content boxes. Just make sure that “match size with other pages” is selected, so that you have uniform output.
This is the final step. Tweak the parameters of this phase until the results are satisfying, then just launch batch processing and leave the computer crunch your pages for some time (300 p books take 20/30 minutes on my core i7 mac). One of the most important features of the program is found in this screen (green box in the image): automatic dewarping. In essence it automatically detects the folds and bends of the page and it flattens it out to an almost perfect rectangle of text. It works so well that sometimes it seems magic! As usual tough you can manually refine the result in the dewarp pane.
You can tweak all parameters on an image and then use batch mode to produce a processed image for every page. I usually output in black and white with double the DPIs I chose at the beginning. I also configure a thicker output, but all these parameters really depend on how you took the photos, so just experiment until you get the best output, then apply to all images.
once the images have been created by scan tailor, you will have to resort to another program in order to do optical character recognition and obtain a searchable document. I personally use ABBYY Finereader Express for mac, it has no options or tweaks, just point it to the folder of your output images and let it think for a while. It will produce a nice PDF file which you can search, copy, highlight, copy text snippets and so on. You can now ditch your original photos and enjoy a nice digital copy of your book.
ABBYY Finereader is a paid program. I haven’t experimented with free options, but you can also upload the processed photos to google docs or to evernote and their servers will eventually do OCR for you.
This guide has shown you how to convert books into PDF taking photos of pages. You already have everything you need for migrating your old physical library into the digital world! It is a fun little project that everyone can make, so it is really time to ditch your conventional dead-tree library in favor of the digital version.